Aggregate along spatial dimension — spatial

Takes geographic data (points or polygons), and aggregates into polygons, while retaining theme-level information. This function sets up the data with specific prep for the way the spatial dimension works, and then wraps general_aggregate(). Many of the arguments here are just passed through to general aggregate.

Usage

spatial_aggregate(
  dat,
  to_geo,
  groupers,
  aggCols,
  funlist,
  ...,
  whichcrs = sf::st_crs(to_geo),
  keepAllPolys = FALSE,
  failmissing = TRUE,
  prefix = "spatial_",
  joinby = "spatial",
  auto_ewr_PU = FALSE
)

Arguments

dat: sf of values to aggregate with any necessary non-spatial grouping information (e.g. scenario, theme)
to_geo: sf polygon or multipolygon that provides the desired spatial level to group into. This uses the intersection sf::st_intersection(), so if dat and to_geo are both polygons, they do not have to be nested.
groupers: as in general_aggregate(), with the note that these should be all grouping columns except the polygons in to_geo, which are automatically added to groupers before passing to general_aggregate().
aggCols: an expression for the columns to aggregate (the data columns). See selectcreator for formats
funlist: a list of functions and their arguments used to aggregate the data. See functionlister for creation in many cases. The situation with a bare anonymous function, e.g. ~mean(., na.rm = T) is not supported because we need a name. Use a named list if using anonymous functions, e.g. list(mean = ~mean(., na.rm = T)). If using functions with a data-variable argument, e.g. weighted.mean with a column of weights, we now (as of dplyr 1.1) have some workarounds. One option is if specified as a function argument, the function can just go in as a bare name or anonymous. If specified elsewhere, it can be wrapped in rlang::quo(), e.g. agglist <- rlang::quo(list(mean = mean, wm = ~weighted.mean(., weight_column_name, na.rm = T))). If it isn't, there is now an internal workaround to add that on that seems to be stable but may cause unforeseen issues. This workaround also allows building custom aggregation functions (not-anonymous) with the data-variable argument either exposed or hardcoded (see SpatialWeightedMean()). The error checks for names do not work for quosures, so make sure you name the list if using rlang::quo().
...: arguments passed to the aggregation functions. This is very limited, and does not work with data arguments under most conditions. Almost always better to specify explicitly when building funlist, but works OK with simple functions, e.g. passing na.rm = TRUE to mean
whichcrs: desired coordinate reference system, easiest is just the numeric EPSG code, but could a full crs definition. See sf::st_crs()
keepAllPolys: logical, default FALSE. Should polygons in to_geo that have no values in dat be retained? The default FALSE keeps NA polygons from cluttering things up, but TRUE can be useful to not lose them, especially for later plotting. However, it is typically best from a data and cleanliness perspective to use FALSE here and use the bare set of polys as an underlay in plot_outcomes().
failmissing: logical, default TRUE: fail if the requested grouping or aggregation columns not exist. If FALSE, proceed with those that do exist and silently drop those that don't. Similar to tidyselect::all_of() vs tidyselect::any_of() in tidyselect
prefix: character, differs from general_aggregate() in that default is 'spatial_' instead of 'agg_'.
joinby: character, default 'spatial' performs the expected spatial join using geometry, 'nonspatial' performs a dplyr::left_join() by common column names, typically as a result of calling multi_aggregate() with pseudo_spatial = 'planning_units'.
auto_ewr_PU: logical, default FALSE. If TRUE, automatically infers whether this is an EWR dataset is undergoing gauge to sdl or planning unit aggregation. If so, joins data non-spatially (sets joinby = 'nonspatial'). The preferred solution is to use joinby in spatial_aggregate() or pseudo_spatial in multi_aggregate(). If none of those solutions happen, though, it aborts to prevent incorrectly spatial joining of gauges to planning units.

Value

an sf with columns for the grouping variables aggregated into the polygons in to_geo and retaining desired theme-level information