hydstra wrapper • hydrogauge

library(hydrogauge)
library(ggplot2)

The fetch_hydstra_timeseries() function wraps get_variable_list() and get_ts_traces(), which allows some extra functionality and smoother workflows but also introduces some danger and sometimes inefficiency. Some argument names have been changed compared to get_ts_traces() (which uses the Kisters names nearly verbatim) for more clarity and to move towards a unified interface across the KiWIS and hydstra API styles.

This function allows requests for the full period of record by passing 'all' for the start_time and end_time (or the earliest and latest possible dates, respectively). The downside of this approach is that the calls to the API are inefficient (each row is called separately), though this is also necessarily the case if asking for the period of record manually for gauges with different periods. It will be possible to do some inferences and combinations here, but it has not been high priority.

This function is most useful when we want to pull the period of record of the same variable for a set of gauges. For example, we might want to pull discharge for the period of record, which we can do by passing 'all' to start_time and end_time, yielding @ref(fig:discharge-period).

Period of record

I’ll use the same set of sites as in the core hydstra demonstration, chosen to capture a range of periods of records and available variables.

The Upper Steavenson (405328) only has flow
Barwon (233217) has many variables, but their start dates differ
Taggerty (405331) is no longer in operation- ran 2010-2013
Marysville golf course (405837) is only rainfall

barwon <- '233217'
steavenson <- '405328'
taggerty <- '405331'
golf <- '405837'

discharge_record <- fetch_hydstra_timeseries(portal = 'vic', 
                                       gauge = c(barwon, steavenson, taggerty),
                                       var_list = '141',
                                       start_time = 'all',
                                       end_time = 'all',
                                       timeunit = 'day',
                                       statistic = 'mean')
#> Loading required package: foreach
#> Loading required package: future

discharge_record |> 
  ggplot(aes(x = time, y = value, color = site_short_name)) +
  geom_line() +
  facet_grid(site_short_name~., scales = 'free_y') +
  labs(y = unique(discharge_record$variable_short_name))

Discharge for the period of record for three gauges.

Multiple data types

We can also pull data for all available variables by passing 'all' to var_list. DANGER: if var_type = 'all', the same statistic will be applied to all variables. Calling 'all' for the start_time and 'end_time' will give each variable a different period of record if they differ; the times are found from each row returned by get_variable_list().

For the sake of demonstration, we make the bad choice here of getting all the data, summarised in @tbl-all. This throws a warning because it’s a bad idea in general.

all_vars_fullperiod <- fetch_hydstra_timeseries(portal = 'vic', 
                                       gauge = c(barwon, golf),
                                       var_list = 'all',
                                       start_time = 'all',
                                       end_time = 'all',
                                       timeunit = 'day',
                                       statistic = 'mean')
#> Warning: `var_list = 'all'` is *very* dangerous, since it applies the same
#> `statistic` (`data_type` in get_ts_traces), i.e. aggregation function, to all
#> variables, which is rarely appropriate. Check the variables available for your
#> sites and make sure you want to do this.

all_vars_fullperiod |> 
  dplyr::summarise(n_records = dplyr::n(), 
                   .by = c(site_short_name, variable_short_name, statistic)) |> 
  knitr::kable()

site_short_name	variable_short_name	statistic	n_records
BARWON @ GEELONG	Water Level (m)	mean	23345
BARWON @ GEELONG	Field pH	mean	5326
BARWON @ GEELONG	DO (ppm)	mean	5326
BARWON @ GEELONG	Temp (°C)	mean	5326
BARWON @ GEELONG	Turbidity (NTU)	mean	5326
BARWON @ GEELONG	EC@25C (µS/cm)	mean	5326
R.G. MARYSVILLE	Rainfall (mm)	mean	8615

We can request different statistics for different variables if the variables are passed in as a vector of arguments to var_list, though then you have to know what they are. In that case, the statistic argument should be a vector of matched length to var_list.

Let’s ask for the period of record for daily mean discharge, total daily rainfall, and maximum daily temperature in the Barwon (233217), which returns different statistics for each variable (@tbl-diffstats).

different_statistics <- fetch_hydstra_timeseries(portal = 'vic', 
                                       gauge = c(barwon, golf),
                                       var_list = c('141', '10', '450'),
                                       start_time = 'all',
                                       end_time = 'all',
                                       timeunit = 'day',
                                       statistic = c('mean', 'tot', 'max'))

different_statistics |> 
  dplyr::summarise(n_records = dplyr::n(), 
                   .by = c(site_short_name, variable_short_name, statistic)) |> 
  knitr::kable()

site_short_name	variable_short_name	statistic	n_records
BARWON @ GEELONG	Discharge (ML/d)	mean	23345
BARWON @ GEELONG	Temp (°C)	max	5326
R.G. MARYSVILLE	Rainfall (mm)	tot	8615

Regex selection

We can also use the variable and unit arguments instead of var_list to search for variables by name, as in fetch_kiwis_timeseries(). This is very experimental, moving towards a unified wrapper. We can use this to recapitulate the pull of discharge for the gauges (@ref(fig:discharge-byname)).

by_name <- fetch_hydstra_timeseries(portal = 'vic', 
                                       gauge = c(barwon, steavenson, taggerty),
                                       variable = 'discharge',
                                       unit = 'ML/d',
                                       start_time = 'all',
                                       end_time = 'all',
                                       timeunit = 'day',
                                       statistic = 'mean')

by_name |> 
  ggplot(aes(x = time, y = value, color = site_short_name)) +
  geom_line() +
  facet_grid(site_short_name~., scales = 'free_y') +
  labs(y = unique(by_name$variable_short_name))

Discharge for the period of record for three gauges, obtained by name.

Large requests

Note: with big pulls, it can be useful to use the bare get_variable_list() and get_ts_traces() approach, or at least a manual check of get_variable_list(). In my experience, there are often errors with some gauges or other issues that mean clean pulls need some troubleshooting of the variable availability etc. It is often easiest to find and solve problems at the low-level API interface. Making fetch_hydstra_timeseries() incorporate some of this is in development.