Set up, run, and (possibly) save EWR outputs — prep_run_save

This does some directory setup and parsing, runs the EWR tool and, if asked, saves the output. If the output saves, it also auto-saves both yaml and json metadata files with all parameters needed to run this part of HydroBOT with parameters. Scenario metadata is prepended, if found.

Usage

prep_run_save_ewrs(
  hydro_dir,
  output_parent_dir,
  output_subdir = "",
  scenarios = NULL,
  model_format = "Standard time-series",
  outputType = "none",
  returnType = "none",
  scenarios_from = "directory",
  file_search = NULL,
  fill_missing = FALSE,
  extrameta = NULL,
  rparallel = FALSE,
  retries = 2,
  print_runs = FALSE,
  url = FALSE
)

Arguments

hydro_dir

Directory containing hydrographs. Can be an outer directory, e.g. hydrographs that splits into scenario subdirs, or can be a single scenario subdir.

output_parent_dir

parent directory for the outputs. Can be anything, but two typical cases:

The directory containing hydro_dir, which puts the module_outputs at the same level as the hydrographs
If running in batches for single scenarios, may be hydro_dir, which just puts the module_outputs in hydro_dir

output_subdir

a sub-directory for the outputs, if for example we want module_output/EWR/V1 and module_output/EWR/V2

scenarios

NULL (default) or named list.

NULL- finds scenario names by parsing directory names in hydro_dir. If no internal directories, just stays in hydro_dir. This captures the two typical situations discussed for output_parent_dir. If there are other directories in hydro_dir that do not contain hydrological scenarios, should use a character vector.
named list of paths to files. names become scenario names, paths should be relative to hydro_dir. This allows unusual directory structures.

model_format

see EWR tool. One of:

'Standard time-series': (default, among other things accepts a csv with a Date column followed by gauge columns, with _flow or _level appended to the gauge number)
'IQQM - netcdf': in development, finds all netcdf files in hydro_dir. Should also work when hydro_dir is a .zip with netcdfs inside
'ten thousand year': old default (IQQM - NSW 10,000 years), works nearly the same as standard time-series
'All Bigmod': previously 'Bigmod - MDBA'
'Source - NSW (res.csv)'

outputType

list of strings or character vector defining what to save to disk. One or more of:

'none' (default), do not save outputs- ignored if in a list with others
'summary',
'yearly',
'all_events',
'all_successful_events',
'all_interEvents'
'all_successful_interEvents'

returnType

list of strings or character vector defining what to return to the active R session. Same options as outputType

scenarios_from

character, default 'directory' gets scenario names from directory names. If anything else, gets them from filenames (safest). Expect additional options in future, e.g from metadata.

file_search

character, regex for additional limitations on filenames. Useful to run a subset of scenarios or if several files have the extension defined by model_format, but only some are hydrographs.

fill_missing

logical, default FALSE. If TRUE, figures out the expected outputs and only runs those that are missing. Useful for long runs that might break.

extrameta

list, extra information to include in saved metadata documentation for the run. Default NULL.

rparallel

logical, default FALSE. If TRUE, parallelises over the scenarios in hydro_dir using furrr. To use, install furrr and set a future::plan() (likely multisession or multicore)

retries

Number of retries if there are errors. 0 is no retries, but still runs once. Default 2.

print_runs

logical, default FALSE. If true, print the set of runs to be done.

url

logical, default FALSE. If TRUE, scenarios needs to be a named list with full file paths (URLs). This bypasses the otherwise automatic prepending of hydro_dir onto a named scenario list.

Value

a list of dataframe(s) if returnType is not 'none', otherwise, NULL

Details

By far the cleanest way for this to work is to have your input hydrographs in a file structure with the directories defining the scenarios, and single or multiple hydrograph files within them. I.e. a structure that does not mix files of different scenarios in the final directory. If you have that structure, using scenarios_from = 'directory' will ensure your scenarios are named uniquely and output files are also unique and not mixed between scenarios. This is particularly important for parallelising, which depends on parallelling over scenarios. This structure is then retained in the output structure, making aggregation simpler as well. If for some reason you cannot establish this structure, set scenarios_from = 'file', and everythign will be given a unique name, but your life will probably be difficult when aggregating and other subsequent processing, requiring more work in scripts to make the appropriate comparisons.