Aggregate data along theme dimension
general_aggregate.Rd
This is the core aggregation function for all the aggregation types. it's really a fairly lightweight wrapper over a small dplyr::group_by and summarise to handle passing in grouping columns, aggregation columns, and functions as arguments and handle naming. It assumes no spatial information to deal with. Preparing the data to have the proper grouping columns is the job of the outer calling functions (or the user)
Usage
general_aggregate(
data,
groupers,
aggCols,
funlist,
prefix = "agg_",
failmissing = TRUE,
...
)
Arguments
- data
a dataframe or tibble with data to aggregate
- groupers
an expression for the columns to use as grouping variables for the aggregation (see
selectcreator
for formats)- aggCols
an expression for the columns to aggregate (the data columns). See
selectcreator
for formats- funlist
a list of functions and their arguments used to aggregate the data. See
functionlister
for creation in many cases. The situation with a bare anonymous function, e.g.~mean(., na.rm = T)
is not supported because we need a name. Use a named list if using anonymous functions, e.g.list(mean = ~mean(., na.rm = T))
. If using functions with a data-variable argument, e.g. weighted.mean with a column of weights, we now (as ofdplyr
1.1) have some workarounds. One option is if specified as a function argument, the function can just go in as a bare name or anonymous. If specified elsewhere, it can be wrapped inrlang::quo()
, e.g.agglist <- rlang::quo(list(mean = mean, wm = ~weighted.mean(., weight_column_name, na.rm = T)))
. If it isn't, there is now an internal workaround to add that on that seems to be stable but may cause unforeseen issues. This workaround also allows building custom aggregation functions (not-anonymous) with the data-variable argument either exposed or hardcoded (seeSpatialWeightedMean()
). The error checks for names do not work for quosures, so make sure you name the list if usingrlang::quo()
.- prefix
character prefix for the column name. Default
"agg_"
, but often better to use the aggregation step. Typically set by particular calling function to give it the type of aggregation- failmissing
logical, default
TRUE
: fail if the requested grouping or aggregation columns not exist. IfFALSE
, proceed with those that do exist and silently drop those that don't. Similar totidyselect::all_of()
vstidyselect::any_of()
intidyselect
- ...
arguments passed to the aggregation functions. This is very limited, and does not work with data arguments under most conditions. Almost always better to specify explicitly when building
funlist
, but works OK with simple functions, e.g. passingna.rm = TRUE
to mean