Wrapper for Kiwis to find and return desired timeseries — fetch_kiwis

Wraps getTimeseriesList() (via find_ts_id()) and getTimeseriesValues() to find the ts_id that matches the sort of timeseries we want and go get it (them) For help with the arguments, run ts_list <- getTimeseriesList(portal = portal, station_no = gauge) to see the getTimeseriesList() output that is parsed to give the ts_id. This can be very helpful if you're getting incorrect records or too many or not enough. Each argument below says which column it filters on, using grepl() with ignore.case = TRUE

Usage

fetch_kiwis_timeseries(
  portal,
  gauge = NULL,
  start_time = NULL,
  end_time = NULL,
  period = NULL,
  variable = "discharge",
  units = "ML/d",
  timeunit = "Daily",
  statistic = "Mean",
  datatype = "QaQc",
  namefilters = NULL,
  extra_list = list(NULL),
  returnfields = "default",
  meta_returnfields = "default",
  request_timezone = "db_default",
  return_timezone = "UTC"
)

Arguments

portal: URL to Kisters KiWIS database.
gauge: character vector of gauge numbers, as station_no for Kiwis functions (site_list for Hydstra)
start_time: character or date or date time for the start in database default timezone. Default NULL.
end_time: character or date or date time for the end in database default timezone. Default NULL.
period: character, default NULL. The special case 'complete' returns the full set of data. Otherwise, beginning with 'P', followed by numbers and characers indicating timespan, e.g. 'P2W'. See documentation.
variable: character vector of variables we want to extract. Matches on parametertype_name
units: units of the variable, used when there may be > 1 e.g. cumecs, ML/d for discharge. If NULL, gets all available. Matches to ts_unitsymbol
timeunit: The time interval to request, e.g. "Daily", the default. Main values seem to be 'Daily', 'Monthly', 'Yearly', and 'AsStored' (the raw data). Matches to part of ts_name
statistic: The aggregation statistic, e.g. "Mean", the default. Main values seem to be 'Mean', 'Max', 'Min', 'Total', though not all are available for each variable- rainfall tends to use Total, while discharge tends to use mean, max, min. Matches to part of ts_name
datatype: The type of data to return, default 'QaQc'. Some other options seem to be 'Recieved', 'Harmonised', and 'Obs'. Note- 'QaQc' matches to both 'DMQaQc' and 'PR01QaQc'. In many cases only one is available, but if you get 2x too much data, check and specify which you want. Matches to part of ts_name
namefilters: character vector giving the ability to match to other parts of ts_name in case those specified in timeunit, statistic, and datatype aren't sufficient to find the desired ts_id. One frequent occurrence is two Daily datasets that differ in whether they split at 9am or midnight, in which case you should use either namefilters = '09HR or namefilters = '24HR. In some situations, this can be easier than using regex, e.g. datatype = 'QaQc.*09'
extra_list: a named list, see getStationList(), with a special note that here we can include a 'timezone' argument that determines the timezone the API returns in. This is dangerous, since the API ingests dates in its own default timezone and that is inferred from the return in the absence of the ability to extract it. Thus, including a timezone in extra_list may yield unexpected outcomes when requesting dates. A better option is to use return_timezone to adjust the return values. That said, it may be that some databases return gauge-local tzs, which won't be allowed to be concatenated. A solution would be to just work in UTC with timezone = 'UTC' in extralist to make all outputs on the same tz.
returnfields: return fields for the data itself. Default is c('Timestamp', 'Value', 'Quality Code'). Full list from Kisters docs)
meta_returnfields: return fields about the variable and site. seems to be able to access most of what getTimeseriesList() has in its returnfields. Full list from Kisters docs)
request_timezone: ignored if start_time and end_time are time objects, otherwise a timezone from OlsonNames() or 'db_default'
return_timezone: character in OlsonNames() or one of three special cases: 'db_default', 'char' or 'raw'. Default 'UTC'. If 'db_default', uses the API default. BOM defaults to +10. If 'char' or 'raw', returns the time column as-is from the API (A string in the format 'YYYY-MM-DDTHH:MM:SS+TZ')

Value

a tibble of the requested timeseries

Details

Note that while each of the filtering arguments variable, units, timeunit, statistic, and datatype can be vectors, they are not positionally matched. Each is just done as a simple OR, and so for example if you have variable = 'discharge', units = c('ML/d', 'cumecs'), and statistic = c('Mean', 'Min'), you'll get the mean and min of both ML/d and cumecs, not the mean of ML/d and min of cumecs. For more control, run this multiple times with the desired subsets. Further, the use of grepl() allows full regex parsing. For example, many gauges have daily values that split at 09:00 or at midnight. Using datatype = 'QaQc.*09' gets just the 09:00 versions.