::install('buzacott/bomWater') renv
Waterdata from BOM
I wrote {hydrogauge} to get information about water gauges in Victoria, and then discovered it also works in NSW and Queensland. It does not seem to work in South Australia though.
Can we just use bomWater? Or use it to figure out how to call BOM myself?
First question is whether it works. I’ve heard rumors BOM has gotten harder to call, but mdba-gauge-getter still manages.
library(bomWater)
It doesn’t obviously have a lot of the query tools from hydrogauge, but let’s see if it works with the example
<- get_daily(parameter_type = 'Water Course Discharge',
cotter_river station_number = '410730',
start_date = '2020-01-01',
end_date = '2020-01-31')
Seems to. Let’s try a couple I know I need
<- get_daily(parameter_type = 'Water Course Discharge',
mr97 station_number = 'A4260505',
start_date = '2000-01-01',
end_date = '2000-05-30')
That basically looks like it works. It’s missing some functionality I want, but much better than nothing.
get_station_list(station_number = 'A4260505')
# A tibble: 1 × 5
station_name station_no station_id station_latitude station_longitude
<chr> <chr> <int> <dbl> <dbl>
1 River Murray at Lock… A4260505 1617110 -34.2 142.
get_parameter_list(station_number = 'A4260505')
# A tibble: 2 × 7
station_no station_id station_name parametertype_id parametertype_name
<chr> <int> <chr> <int> <chr>
1 A4260505 1617110 River Murray at Loc… 11762 Water Course Disc…
2 A4260505 1617110 River Murray at Loc… 11763 Water Course Level
# ℹ 2 more variables: parametertype_unitname <chr>,
# parametertype_shortunitname <chr>
It looks like there is a getDataAvailabiltiy option in the API, but bomWater doesn’t query it. The requests in bomWater don’t obviously map to the docs, so this would take some tweaking.
In searching for how the ‘request’ in bomWater turns into the getSomethingSomething
in the API (unsuccessfully), I found another package to try. It’s canadian, but seems to hit Kisters WISKI generally.
library('kiwisR')
Check it works.
ki_timeseries_list(hub = 'https://www.swmc.mnr.gov.on.ca/KiWIS/KiWIS?', station_id = '144659')
# A tibble: 223 × 6
station_name station_id ts_id ts_name from to
<chr> <chr> <chr> <chr> <dttm> <dttm>
1 Jackson Cre… 144659 9489… Precip… 2007-06-18 20:15:00 2024-09-10 22:15:00
2 Jackson Cre… 144659 1143… Precip… 2007-07-01 05:00:00 2020-09-01 05:00:00
3 Jackson Cre… 144659 1143… Precip… 2007-06-18 05:00:00 2024-09-11 05:00:00
4 Jackson Cre… 144659 9489… TAir.1… 2007-06-18 20:15:00 2024-09-10 22:15:00
5 Jackson Cre… 144659 9489… TAir.D… 2007-06-18 05:00:00 2024-09-09 05:00:00
6 Jackson Cre… 144659 9489… TAir.D… 2007-06-18 05:00:00 2024-09-09 05:00:00
7 Jackson Cre… 144659 1129… TAir.6… 2007-06-19 00:00:00 2024-09-10 18:00:00
8 Jackson Cre… 144659 1326… TAir.D… 2007-06-18 05:00:00 2024-09-11 05:00:00
9 Jackson Cre… 144659 1326… TAir.D… 2007-06-18 05:00:00 2024-09-11 05:00:00
10 Jackson Cre… 144659 9490… TWater… 2007-06-18 05:00:00 2024-09-09 05:00:00
# ℹ 213 more rows
Does it work for BOM? No, the url is almost certainly wrong. This expects a KiWIS API, which it looks like BoM doesnt use (at least just shoving Kiwis on the end doesn’t work.
ki_timeseries_list(hub = "http://www.bom.gov.au/waterdata/services", station_id = 'A4260505')
Error in if (nrow(json_content) == 2) {: argument is of length zero
ki_timeseries_list(hub = "http://www.bom.gov.au/waterdata/services/KiWIS/KiWIS?", station_id = 'A4260505')
Error: lexical error: invalid char in json text.
<html><head><title>Apache Tomca
(right here) ------^
Interesting. If I go to http://www.bom.gov.au/waterdata/services, I get the message “KISTERS KiWIS QueryServices - add parameter ‘request’ to execute a query.” So it is a KiWIS, but maybe doesn’t take request in the same way as kiwisR expects? bomWater does use request
, so maybe this will help figure out how to specify new ones. Looking at code, kiwisR and bomWater look like they’re constructing the requests the same, so it’s a bit odd the kiwis doesn’t work with the bomWater url.
Am I just calling something incorrectly? Can kiwisR hit that URL for other things? I can get a list of all stations. So that implies the URL does work. This is just very long, so I’m not rendering it.
ki_station_list(hub = "http://www.bom.gov.au/waterdata/services")
It doesn’t seem to work to search for stations by id though.
ki_station_list(hub = "http://www.bom.gov.au/waterdata/services", search_term = "A4260505")
# A tibble: 0 × 5
# ℹ 5 variables: station_name <chr>, station_no <chr>, station_id <chr>,
# station_latitude <dbl>, station_longitude <dbl>
Ah! It hits the station_name
, not the gauge number in station_no
ki_station_list(hub = "http://www.bom.gov.au/waterdata/services", search_term = "A*")
# A tibble: 1,951 × 5
station_name station_no station_id station_latitude station_longitude
<chr> <chr> <chr> <dbl> <dbl>
1 A 61700620 400630 -30.3 115.
2 A 60210202 11517225 -35.1 118.
3 A 60210203 11517229 -35.1 118.
4 A 60210201 11517221 -35.1 118.
5 A 60210298 11520185 -35.1 118.
6 A 61311025 11457669 -32.7 116.
7 A 120310078 11287358 -25.5 129.
8 A 60110485 11465949 -33.7 121.
9 A 60110497 11465977 -33.7 121.
10 A 60110584 11466057 -33.6 122.
# ℹ 1,941 more rows
So, can I get it with
ki_station_list(hub = "http://www.bom.gov.au/waterdata/services",
search_term = "River Murray at Lock 9 Downstream*")
# A tibble: 1 × 5
station_name station_no station_id station_latitude station_longitude
<chr> <chr> <chr> <dbl> <dbl>
1 River Murray at Lock… A4260505 1617110 -34.2 142.
The ki_timeseries_list
uses station_id
. But I’ve been feeding it gauge numbers, which are station_no
. It works with the ID.
<- ki_timeseries_list(hub = "http://www.bom.gov.au/waterdata/services", station_id = '1617110')
tl
tl
# A tibble: 55 × 6
station_name station_id ts_id ts_name from to
<chr> <chr> <chr> <chr> <dttm> <dttm>
1 River Murra… 1617110 2086… Receiv… NA NA
2 River Murra… 1617110 2086… Harmon… NA NA
3 River Murra… 1617110 2086… DMQaQc… 1949-07-01 23:29:59 2024-09-08 22:30:01
4 River Murra… 1617110 2086… DMQaQc… 2008-11-26 04:34:59 2024-09-09 22:15:01
5 River Murra… 1617110 2086… DMQaQc… 2008-11-26 03:30:00 2024-09-09 21:30:00
6 River Murra… 1617110 3293… Derive… NA NA
7 River Murra… 1617110 3293… PR01Ma… NA NA
8 River Murra… 1617110 3293… PR01Ma… NA NA
9 River Murra… 1617110 3293… PR01Qa… 1949-07-01 23:29:59 2024-09-08 22:30:01
10 River Murra… 1617110 3293… PR01Qa… 1949-07-01 23:29:59 2024-09-08 22:30:01
# ℹ 45 more rows
Then, I should be able to use ki_timeseries_values
if I know the ts_id I want.There’s lots of cryptic ts_name in there, but daily is “DMQaQc.Merged.DailyMean.24HR”. There are two versions here, for different date ranges.
$ts_name == "DMQaQc.Merged.DailyMean.24HR", ] tl[tl
# A tibble: 2 × 6
station_name station_id ts_id ts_name from to
<chr> <chr> <chr> <chr> <dttm> <dttm>
1 River Murray… 1617110 2086… DMQaQc… 2008-11-25 14:30:00 2024-09-09 14:30:00
2 River Murray… 1617110 2086… DMQaQc… 1949-07-01 14:30:00 2024-09-08 14:30:00
Is that why ki_timeseries_values
doesn’t have a station argument? are the ts_ids unique across gauges? Look at two gauges. COtter river (from way above) is id 13360.
No idea why Cotter has so many ts_ids with identical ranges, but they are unique.
<- ki_timeseries_list(hub = "http://www.bom.gov.au/waterdata/services",
tl2 station_id = c('1617110', '13360'))
|>
tl2 ::filter(ts_name == "DMQaQc.Merged.DailyMean.24HR") dplyr
# A tibble: 8 × 6
station_name station_id ts_id ts_name from to
<chr> <chr> <chr> <chr> <dttm> <dttm>
1 River Murray… 1617110 2086… DMQaQc… 2008-11-25 14:30:00 2024-09-09 14:30:00
2 River Murray… 1617110 2086… DMQaQc… 1949-07-01 14:30:00 2024-09-08 14:30:00
3 Cotter R. at… 13360 1573… DMQaQc… 1963-07-02 14:00:00 2024-09-08 14:00:00
4 Cotter R. at… 13360 1598… DMQaQc… 1963-07-02 14:00:00 2024-09-09 14:00:00
5 Cotter R. at… 13360 3801… DMQaQc… 2003-02-23 14:00:00 2024-09-09 14:00:00
6 Cotter R. at… 13360 3801… DMQaQc… 2003-02-23 14:00:00 2024-09-09 14:00:00
7 Cotter R. at… 13360 3801… DMQaQc… 2003-02-23 14:00:00 2024-09-08 14:00:00
8 Cotter R. at… 13360 3801… DMQaQc… 1999-09-23 14:00:00 2024-09-09 14:00:00
bomWater must be dealing with duplication somehow, because
any(duplicated(cotter_river$Timestamp))
[1] FALSE
ah. bomwater just uses ts_id[1]
. That’s likely not the best move. What’s better? not sure. Would be good to assess them somehow. Could give options of ‘longest’, ‘all’, ‘first’ (with longest possibly still needing a ‘first’ or ‘all’ if there are multiple.)
So, all that boils down to that I should be able to choose one of those ts_ids and pull data. Choosing one from the Murray and one from Cotter
<- ki_timeseries_values(hub = "http://www.bom.gov.au/waterdata/services",
test_timeseries ts_id = c("208669010", "380185010"),
start_date = '2010-01-01', end_date = '2010-02-28')
library(ggplot2)
ggplot(test_timeseries, aes(x = Timestamp, y = Value, color = station_name)) +
geom_line()
Warning: Removed 6 rows containing missing values or values outside the scale range
(`geom_line()`).
Fitting into a workflow
I typically have a gauge number, want to get the period of record, and then pull data. I can do that here, but it’s a bit roundabout because the filters keep changing what they filter. And I’d like to not have to depend on both bomWater and kiwisR.
Above, I had to go from all stations, find the name and id that matched the no, and then could get the other things. But there’s got to be a way to just search with any of those, rather than different ones for different functions, right? bomWater seems to do it.
Is there a way to search for the gauge? Not obviously, weirdly.
So, as it stands, a kiwisR based workflow looks something like this:
Get the cross-referencing info for the gauges
<- c('410730', 'A4260505')
gauge_numbers
<- ki_station_list(hub = "http://www.bom.gov.au/waterdata/services")
all_stations
<- all_stations |>
intended_stations ::filter(station_no %in% gauge_numbers) dplyr
If we want to see what info is available (including date ranges)
<- ki_timeseries_list(hub = "http://www.bom.gov.au/waterdata/services",
available_info station_id = intended_stations$station_id)
If we want to get the info, choose a var, but then we also need a ts_id.
<- "DMQaQc.Merged.DailyMean.24HR"
var_to_get <- "2010-01-01"
start_time <- "2010-02-28"
end_time <- 'first'
choose_ids
<- available_info |>
all_var_to_get ::filter(ts_name == var_to_get)
dplyr
if (choose_ids == 'first') {
<- all_var_to_get |>
ids_to_get ::group_by(station_id) |>
dplyr::summarise(ts_id = dplyr::first(ts_id),
dplyrfrom = dplyr::first(from),
to = dplyr::first(to)) # not sure worth returning
}
<- ids_to_get$ts_id
ids
<- ki_timeseries_values(hub = "http://www.bom.gov.au/waterdata/services",
pulled_ts ts_id = ids,
start_date = start_time, end_date = end_time)
That works. And then we’d likely want to conver to ML/d instead of cm^3s-1
ggplot(pulled_ts, aes(x = Timestamp, y = Value, color = station_name)) +
geom_line()
So, that is roundabout, but works. I guess I’ll do that until it gets too slow to pull the whole thing and then fork and add code.
Some checking of the available sites
What are the groups?
ki_group_list(hub = "http://www.bom.gov.au/waterdata/services")
# A tibble: 8 × 3
group_id group_name group_type
<chr> <chr> <chr>
1 20017539 MDB_WIP_Storages station
2 20017550 MDB_WIP_Watercourse station
3 19792386 Rainfall daily 24 timeseries
4 19792387 Rainfall monthly timeseries
5 19792388 Rainfall yearly timeseries
6 19792389 Rainfall daily 9 timeseries
7 20017540 TS_MDB_WIP_Storages timeseries
8 20017541 TS_MDB_WIP_Watercourse timeseries
Why are those all prefaced by MDB? Shouldn’t this be australia-wide?
<- ki_station_list(hub = "http://www.bom.gov.au/waterdata/services",
all_watercourse_stations group_id = '20017550')
library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
<- all_watercourse_stations |>
all_ws st_as_sf(coords = c('station_longitude', 'station_latitude'))
That is quite obviously just the Murray-Darling Basin. Where are the rest of the BOM sites?
ggplot(all_ws) + geom_sf()
<- ki_station_list(hub = "http://www.bom.gov.au/waterdata/services") all_stations
There’s a lot of NA in there, so delete them and make sf
<- all_stations |>
all_s ::filter(!is.na(station_longitude) & !is.na(station_latitude)) |>
dplyrst_as_sf(coords = c('station_longitude', 'station_latitude'))
Clearly nationwide. Plus some that are clearly wrong. Some I’m sure are boreholes and such, but there must be flow gauges that just don’t end up in any group_id.
ggplot(all_s) + geom_sf()
To confirm, look for a river definitely not in the MDB- the Gordon (at least some of these around -42 latitude) are in Tassie.
ki_station_list(hub = "http://www.bom.gov.au/waterdata/services",
search_term = "Gordon*")
# A tibble: 29 × 5
station_name station_no station_id station_latitude station_longitude
<chr> <chr> <chr> <dbl> <dbl>
1 GORDON BK@FINEFLOWER 204067 586091 -29.4 153.
2 GORDON LAKE - AT IN… 646.1 3162538 -42.7 146.
3 GORDON RIVER - A/B … 187.1 3295551 -42.6 146.
4 GORDON RIVER - ABOV… 2491.1 3297416 -42.7 146.
5 Gordon 61310704 11447065 -33.0 116.
6 Gordon 509568 383787 -32.6 116.
7 Gordon Bore Rainfall 570823 14015 -35.5 149.
8 Gordon Catchment 614060 392486 -32.6 116.
9 Gordon Clim17M Tower 509582 383835 -32.6 116.
10 Gordon Clim33 Tower 509581 383828 -32.6 116.
# ℹ 19 more rows
Add to hydrogauge?
Can I get this to work?
I’m having issues with the requests, seemingly because I’m using httr2 instead of httr
For example, if i handbuild the call to the API for getStationList
from kiwisR,
<- "http://www.bom.gov.au/waterdata/services"
api_url
<- "station_name,station_no,station_id,station_latitude,station_longitude"
return_fields
<- "River Murray at Lock*"
search_term
# Query
<- list(
api_query service = "kisters",
datasource = 0,
type = "queryServices",
request = "getStationList",
format = "json",
kvp = "true",
returnfields = paste(
return_fields,collapse = ","
)
)
"station_name"]] <- search_term api_query[[
Run with httr::GET, as they do
<- httr::GET(
raw url = api_url,
query = api_query,
::timeout(15)
httr
)
raw
Response [http://www.bom.gov.au/waterdata/services?service=kisters&datasource=0&type=queryServices&request=getStationList&format=json&kvp=true&returnfields=station_name%2Cstation_no%2Cstation_id%2Cstation_latitude%2Cstation_longitude&station_name=River%20Murray%20at%20Lock%2A]
Date: 2024-09-10 23:24
Status: 200
Content-Type: application/json;charset=UTF-8
Size: 1.95 kB
Parse
<- httr::content(raw, "text")
raw_content
# Parse text
<- jsonlite::fromJSON(raw_content)
json_content
# Convert to tibble
<- tibble::as_tibble(
content_dat x = json_content,
.name_repair = "minimal"
-1, ] )[
But if I use httr2, it doesn’t return anything in the body
<- httr2::request(api_url) |>
response_body ::req_body_json(api_query) |>
httr2::req_perform()
httr2
# Cannot retrieve empty body
<- response_body |>
response_body ::resp_body_json(check_type = FALSE) httr2
Error in `resp_body_raw()`:
! Can't retrieve empty body.
I think the issue is that the request format actually shouldn’t be json– this doesn’t look like what HTTR says its request looks like
::request(api_url) |>
httr2::req_body_json(api_query) |>
httr2::req_dry_run() httr2
POST /waterdata/services HTTP/1.1
Host: www.bom.gov.au
User-Agent: httr2/1.0.3 r-curl/5.2.2 libcurl/8.3.0
Accept: */*
Accept-Encoding: deflate, gzip
Content-Type: application/json
Content-Length: 241
{"service":"kisters","datasource":0,"type":"queryServices","request":"getStationList","format":"json","kvp":"true","returnfields":"station_name,station_no,station_id,station_latitude,station_longitude","station_name":"River Murray at Lock*"}
Is it that i need to just use headers? instead of a json body?
::request(api_url) |>
httr2::req_headers(!!!api_query) |>
httr2::req_dry_run() httr2
GET /waterdata/services HTTP/1.1
Host: www.bom.gov.au
User-Agent: httr2/1.0.3 r-curl/5.2.2 libcurl/8.3.0
Accept: */*
Accept-Encoding: deflate, gzip
service: kisters
datasource: 0
type: queryServices
request: getStationList
format: json
kvp: true
returnfields: station_name,station_no,station_id,station_latitude,station_longitude
station_name: River Murray at Lock*
<- httr2::request(api_url) |>
test_resp ::req_headers(!!!api_query) |>
httr2::req_perform() httr2
Looks like that didn’t work…
::resp_body_string(test_resp) httr2
[1] "KISTERS KiWIS QueryServices - add parameter 'request' to execute a query."
How about req_url_query
? That looks right
::request(api_url) |>
httr2::req_url_query(!!!api_query) |>
httr2::req_dry_run() httr2
GET /waterdata/services?service=kisters&datasource=0&type=queryServices&request=getStationList&format=json&kvp=true&returnfields=station_name%2Cstation_no%2Cstation_id%2Cstation_latitude%2Cstation_longitude&station_name=River%20Murray%20at%20Lock%2A HTTP/1.1
Host: www.bom.gov.au
User-Agent: httr2/1.0.3 r-curl/5.2.2 libcurl/8.3.0
Accept: */*
Accept-Encoding: deflate, gzip
<- httr2::request(api_url) |>
test_out ::req_url_query(!!!api_query) |>
httr2::req_perform() httr2
<- httr2::resp_body_json(test_out)
jsonout
# after some flipping and checking;
<- unlist(jsonout[1])
tibnames
<- jsonout[-1] |>
tibout ::tibble() |>
tibble::unnest_wider(col = 1, names_sep = '_') |>
tidyrsetNames(tibnames)
That seems to work. So, do I want to integrate this with hydrogauge? Can I use the same basic code? Not really, since the states need json bodies, and this needs a list-query. BUT, can I do some background parsing? If it works to send NULL in for the query and the body, can write the request to do both, but only actually do one or the other.
Does it work to do this?
<- httr2::request(api_url) |>
test_out_bom ::req_url_query(!!!api_query) |>
httr2::req_body_json(NULL) |>
httr2::req_perform()
httr2
<- list("function" = 'get_variable_list',
paramlist "version" = "1",
"params" = list("site_list" = '233217',
"datasource" = "A"))
# The query requires somethign named.
<- httr2::request("https://data.water.vic.gov.au/cgi/webservice.exe?") |>
test_out_state ::req_url_query(fake = NULL) |>
httr2::req_body_json(paramlist) |>
httr2::req_perform()
httr2
# it can be a list of null
<- list(fake = NULL)
nullist <- httr2::request("https://data.water.vic.gov.au/cgi/webservice.exe?") |>
test_out_state ::req_url_query(!!!nullist) |>
httr2::req_body_json(paramlist) |>
httr2::req_perform() httr2
That leaves aside the question of do we want to do that. It would be nice to unify the experience, I think, if that’s all we have to change in the main getResponse function. And then I can write separate bom and state versions of the functions accessible separately or through common wrappers that standardize syntax and outputs. Potentially just get_ts_traces_2.