Skip to contents

This is the core aggregation function for all the aggregation types. it's really a fairly lightweight wrapper over a small dplyr::group_by and summarise to handle passing in grouping columns, aggregation columns, and functions as arguments and handle naming. It assumes no spatial information to deal with. Preparing the data to have the proper grouping columns is the job of the outer calling functions (or the user)

Usage

general_aggregate(
  data,
  groupers,
  aggCols,
  funlist,
  prefix = "agg_",
  failmissing = TRUE,
  ...
)

Arguments

data

a dataframe or tibble with data to aggregate

groupers

an expression for the columns to use as grouping variables for the aggregation (see selectcreator for formats)

aggCols

an expression for the columns to aggregate (the data columns). See selectcreator for formats

funlist

a list of functions and their arguments used to aggregate the data. See functionlister for creation in many cases. The situation with a bare anonymous function, e.g. ~mean(., na.rm = T) is not supported because we need a name. Use a named list if using anonymous functions, e.g. list(mean = ~mean(., na.rm = T)). If using functions with a data-variable argument, e.g. weighted.mean with a column of weights, we now (as of dplyr 1.1) have some workarounds. One option is if specified as a function argument, the function can just go in as a bare name or anonymous. If specified elsewhere, it can be wrapped in rlang::quo(), e.g. agglist <- rlang::quo(list(mean = mean, wm = ~weighted.mean(., weight_column_name, na.rm = T))). If it isn't, there is now an internal workaround to add that on that seems to be stable but may cause unforeseen issues. This workaround also allows building custom aggregation functions (not-anonymous) with the data-variable argument either exposed or hardcoded (see SpatialWeightedMean()). The error checks for names do not work for quosures, so make sure you name the list if using rlang::quo().

prefix

character prefix for the column name. Default "agg_", but often better to use the aggregation step. Typically set by particular calling function to give it the type of aggregation

failmissing

logical, default TRUE: fail if the requested grouping or aggregation columns not exist. If FALSE, proceed with those that do exist and silently drop those that don't. Similar to tidyselect::all_of() vs tidyselect::any_of() in tidyselect

...

arguments passed to the aggregation functions. This is very limited, and does not work with data arguments under most conditions. Almost always better to specify explicitly when building funlist, but works OK with simple functions, e.g. passing na.rm = TRUE to mean

Value

a tibble with columns for the grouping variables and a column of within-group aggregated values for each aggCol and function in funlist, named according to the function applied and original name.