Wrapper for Kiwis to find and return desired timeseries
Source:R/fetch_kiwis_timeseries.R
fetch_kiwis_timeseries.Rd
Wraps getTimeseriesList()
(via find_ts_id()
) and getTimeseriesValues()
to find the ts_id
that matches the sort of timeseries we want and go get it
(them) For help with the arguments, run ts_list <- getTimeseriesList(portal = portal, station_no = gauge)
to see the getTimeseriesList()
output that
is parsed to give the ts_id
. This can be very helpful if you're getting
incorrect records or too many or not enough. Each argument below says which
column it filters on, using grepl()
with ignore.case = TRUE
Usage
fetch_kiwis_timeseries(
portal,
gauge = NULL,
start_time = NULL,
end_time = NULL,
period = NULL,
variable = "discharge",
units = "ML/d",
timeunit = "Daily",
statistic = "Mean",
datatype = "QaQc",
namefilters = NULL,
extra_list = list(NULL),
returnfields = "default",
meta_returnfields = "default",
request_timezone = "db_default",
return_timezone = "UTC"
)
Arguments
- portal
URL to Kisters KiWIS database.
- gauge
character vector of gauge numbers, as
station_no
for Kiwis functions (site_list
for Hydstra)- start_time
character or date or date time for the start in database default timezone. Default NULL.
- end_time
character or date or date time for the end in database default timezone. Default NULL.
- period
character, default NULL. The special case 'complete' returns the full set of data. Otherwise, beginning with 'P', followed by numbers and characers indicating timespan, e.g. 'P2W'. See documentation.
- variable
character vector of variables we want to extract. Matches on
parametertype_name
- units
units of the variable, used when there may be > 1 e.g. cumecs, ML/d for discharge. If NULL, gets all available. Matches to
ts_unitsymbol
- timeunit
The time interval to request, e.g. "Daily", the default. Main values seem to be 'Daily', 'Monthly', 'Yearly', and 'AsStored' (the raw data). Matches to part of
ts_name
- statistic
The aggregation statistic, e.g. "Mean", the default. Main values seem to be 'Mean', 'Max', 'Min', 'Total', though not all are available for each variable- rainfall tends to use Total, while discharge tends to use mean, max, min. Matches to part of
ts_name
- datatype
The type of data to return, default 'QaQc'. Some other options seem to be 'Recieved', 'Harmonised', and 'Obs'. Note- 'QaQc' matches to both 'DMQaQc' and 'PR01QaQc'. In many cases only one is available, but if you get 2x too much data, check and specify which you want. Matches to part of
ts_name
- namefilters
character vector giving the ability to match to other parts of
ts_name
in case those specified in timeunit, statistic, and datatype aren't sufficient to find the desiredts_id
. One frequent occurrence is two Daily datasets that differ in whether they split at 9am or midnight, in which case you should use eithernamefilters = '09HR
ornamefilters = '24HR
. In some situations, this can be easier than using regex, e.g.datatype = 'QaQc.*09'
- extra_list
a named list, see
getStationList()
, with a special note that here we can include a 'timezone' argument that determines the timezone the API returns in. This is dangerous, since the API ingests dates in its own default timezone and that is inferred from the return in the absence of the ability to extract it. Thus, including atimezone
inextra_list
may yield unexpected outcomes when requesting dates. A better option is to usereturn_timezone
to adjust the return values. That said, it may be that some databases return gauge-local tzs, which won't be allowed to be concatenated. A solution would be to just work in UTC withtimezone = 'UTC'
in extralist to make all outputs on the same tz.- returnfields
return fields for the data itself. Default is
c('Timestamp', 'Value', 'Quality Code')
. Full list from Kisters docs)- meta_returnfields
return fields about the variable and site. seems to be able to access most of what
getTimeseriesList()
has in itsreturnfields
. Full list from Kisters docs)- request_timezone
ignored if start_time and end_time are time objects, otherwise a timezone from
OlsonNames()
or 'db_default'- return_timezone
character in
OlsonNames()
or one of three special cases:'db_default'
,'char'
or'raw'
. Default 'UTC'. If 'db_default', uses the API default. BOM defaults to +10. If'char'
or'raw'
, returns the time column as-is from the API (A string in the format'YYYY-MM-DDTHH:MM:SS+TZ'
)
Details
Note that while each of the filtering arguments variable
, units
,
timeunit
, statistic
, and datatype
can be vectors, they are not
positionally matched. Each is just done as a simple OR, and so for example if
you have variable = 'discharge'
, units = c('ML/d', 'cumecs')
, and
statistic = c('Mean', 'Min')
, you'll get the mean and min of both ML/d
and cumecs, not the mean of ML/d and min of cumecs. For more control, run
this multiple times with the desired subsets. Further, the use of grepl()
allows full regex parsing. For example, many gauges have daily values that
split at 09:00 or at midnight. Using datatype = 'QaQc.*09'
gets just the
09:00 versions.