A typical workflow for hydrogauge involves querying the database to find gauges, querying the available records for a set of gauges, and then pulling desired timeseries. Both APIs (hydllp and KiWIS) have similar ability to perform these steps, but are called differently.
In general, the function names that hit the API directly are named as
in the relevant API calls, with hydstra names looking like
get_hydstra_function()
, and KiWIS looking like
getKiwisFunction()
.
For an example of this common workflow, see the respective hydstra and KiWIS articles.
Which API?
How do you know which API style you have? It’s best to find
documentation. hydrogauge does try to infer, though there are no
guarantees it always will guess right. To find this inference, run
parse_url()
with test = TRUE, type = TRUE
:
BOM (KiWIS)
parse_url('http://www.bom.gov.au/waterdata/services', test = TRUE, type = TRUE)
#> # A tibble: 1 × 2
#> baseURL portal_type
#> <chr> <chr>
#> 1 http://www.bom.gov.au/waterdata/services kiwis
NSW (hydstra)
parse_url('https://realtimedata.waternsw.com.au/cgi/webservice.exe?', test = TRUE, type = TRUE)
#> # A tibble: 1 × 2
#> baseURL portal_type
#> <chr> <chr>
#> 1 https://realtimedata.waternsw.com.au/cgi/webservice.exe? hydstra
An API-agnostic wrapper is in development, though it will always still work best if the type is known a priori.
Times
Both sets of API (Hydstra/hydllp and kiwis) expect times to be
database-local and return times local to the database (not the user, not
necessarily the gauge itself). This package provides some functionality
to return times in other formats (see return_timezone
argument), including a standard parseable character format, UTC, and
local time objects. In general, working with UTC and offsets is
preferable, but care should be taken when giving desired start and end
times, as these need to be in database-local time.
Hydstra definitions and available options
I have tried to keep the function argument names the same as in the API, and the API restricts the options to the function arguments. The API functions are documented by Kisters (the creators), and there is a bit more information about options from Queensland, though there are discrepancies between states.
After quite a lot of digging and testing, an incomplete set of definitions of common arguments and their potential values follows:
site_list
is the gauge number as a character (and the functions here accept a vector of these gauge numbers). Can be any gauge number in the database. Obtaining them programatically is limited currently, except inget_sites_by_datasource
.-
datasource
The type of data. Victoria returns"A"
,"TELEM"
, and"TELEMCOPY"
from get_datasources_by_site(), but there may be others and sites I have not examined.A
is ‘archive’,TELEM
is ‘telemetry’, and I’m not sure why there’s a copy."CP"
seems to work in NSW, Victoria, and QLD, but isn’t documented anywhere I can find.- Some quick testing with
get_sites_by_datasource
shows that there are many more sites withA
thanTELEM
, but thatTELEM
is not a subset- there are sites withTELEM
and notA
. Initial testing ofget_db_info
also finds sites that do not appear with any of these, and seem to have no data.
- Some quick testing with
-
var_list
is the type of variable, e.g. rainfall, flow, temp. Should be a character of the numeric code, with or without trailing “.00”, and can be a vector, e.g. c(“100”, “210.00”).I do not currently have a comprehensive list of possible variables and their meaning, but
get_variables_by_site
will provide one for a set of sites.The Queensland documentation gives some more information, but the numbers are not always the same for different databases.
Some variables (typically discharge) are calculated, and do not appear in queries of available variables such as
get_variables_by_site
. Those I’m aware of are “141”- discharge in ML/day, and “140”, discharge in cumecs ().
-
start_time
andend_time
are the start and end times of the period requested. The API is strict that these should be 14-digit strings “YYYYMMDDHHIIEE”, but the functions here will take them in date formats (posix), character, or numeric, and they do not need to be 14-digits.- If not dates, they should be at least YYYYMMDD (either character or numeric), and the rest will be padded with zeros.
-
interval
is the time interval of the return values for timeseries. Options seem to be (based on API error messages)"year"
,"month"
,"day"
,"hour"
,"minute"
,"second"
. I don’t think capitalisation matters.Also have not thoroughly tested if only some are available for some variables at some sites
multiplier
allows intervals like 5 days, by passinginterval = 'day'
andmultiplier = 5
.-
data_type
is the statistic to apply within each interval to get the values.Options (from API error messages):
"mean"
,"max"
,"min"
,"start"
,"end"
,"first"
,"last"
,"tot"
,"maxmin"
,"point"
,"cum"
. Not all are currently tested.Warning: any given API call can only takes one value, which is applied to all variables. This is unlikely to be appropriate if asking for many variables.
run
get_ts_traces()
multiple times, with different subsets ofvar_type
, each with an appropriatedata_types