Set up, run, and (possibly) save EWR outputs
prep_run_save_ewrs.Rd
This does some directory setup and parsing, runs the EWR tool and, if asked, saves the output. If the output saves, it also auto-saves both yaml and json metadata files with all parameters needed to run this part of HydroBOT with parameters. Scenario metadata is prepended, if found.
Usage
prep_run_save_ewrs(
hydro_dir,
output_parent_dir,
output_subdir = "",
scenarios = NULL,
model_format = "Standard time-series",
outputType = "none",
returnType = "none",
scenarios_from = "directory",
file_search = NULL,
fill_missing = FALSE,
extrameta = NULL,
rparallel = FALSE,
retries = 2,
print_runs = FALSE,
url = FALSE
)
Arguments
- hydro_dir
Directory containing hydrographs. Can be an outer directory, e.g.
hydrographs
that splits into scenario subdirs, or can be a single scenario subdir.- output_parent_dir
parent directory for the outputs. Can be anything, but two typical cases:
The directory containing
hydro_dir
, which puts themodule_outputs
at the same level as the hydrographsIf running in batches for single scenarios, may be
hydro_dir
, which just puts themodule_outputs
inhydro_dir
- output_subdir
a sub-directory for the outputs, if for example we want
module_output/EWR/V1
andmodule_output/EWR/V2
- scenarios
NULL
(default) or named list.NULL
- finds scenario names by parsing directory names inhydro_dir
. If no internal directories, just stays inhydro_dir
. This captures the two typical situations discussed foroutput_parent_dir
. If there are other directories inhydro_dir
that do not contain hydrological scenarios, should use a character vector.named list of paths to files. names become scenario names, paths should be relative to
hydro_dir
. This allows unusual directory structures.
- model_format
see EWR tool. One of:
'Standard time-series': (default, among other things accepts a csv with a Date column followed by gauge columns, with _flow or _level appended to the gauge number)
'IQQM - netcdf': in development, finds all netcdf files in
hydro_dir
. Should also work whenhydro_dir
is a .zip with netcdfs inside'ten thousand year': old default (IQQM - NSW 10,000 years), works nearly the same as standard time-series
'All Bigmod': previously 'Bigmod - MDBA'
'Source - NSW (res.csv)'
- outputType
list of strings or character vector defining what to save to disk. One or more of:
'none' (default), do not save outputs- ignored if in a list with others
'summary',
'yearly',
'all_events',
'all_successful_events',
'all_interEvents'
'all_successful_interEvents'
- returnType
list of strings or character vector defining what to return to the active R session. Same options as
outputType
- scenarios_from
character, default 'directory' gets scenario names from directory names. If anything else, gets them from filenames (safest). Expect additional options in future, e.g from metadata.
- file_search
character, regex for additional limitations on filenames. Useful to run a subset of scenarios or if several files have the extension defined by
model_format
, but only some are hydrographs.- fill_missing
logical, default FALSE. If TRUE, figures out the expected outputs and only runs those that are missing. Useful for long runs that might break.
- extrameta
list, extra information to include in saved metadata documentation for the run. Default NULL.
- rparallel
logical, default FALSE. If TRUE, parallelises over the scenarios in hydro_dir using
furrr
. To use, installfurrr
and set afuture::plan()
(likelymultisession
ormulticore
)- retries
Number of retries if there are errors. 0 is no retries, but still runs once. Default 2.
- print_runs
logical, default FALSE. If true, print the set of runs to be done.
- url
logical, default FALSE. If TRUE,
scenarios
needs to be a named list with full file paths (URLs). This bypasses the otherwise automatic prepending of hydro_dir onto a named scenario list.
Details
By far the cleanest way for this to work is to have your input hydrographs in
a file structure with the directories defining the scenarios, and single or
multiple hydrograph files within them. I.e. a structure that does not mix
files of different scenarios in the final directory. If you have that
structure, using scenarios_from = 'directory'
will ensure your scenarios
are named uniquely and output files are also unique and not mixed between
scenarios. This is particularly important for parallelising, which depends on
parallelling over scenarios. This structure is then retained in the output
structure, making aggregation simpler as well. If for some reason you
cannot establish this structure, set scenarios_from = 'file'
, and
everythign will be given a unique name, but your life will probably be
difficult when aggregating and other subsequent processing, requiring more
work in scripts to make the appropriate comparisons.