I’ve always meant to build packages, but never quite have the time, and often end up with very convoluted projects that are not ideal for shoehorning into a typical package structure, particularly as a first try.
In part, I think, that is because my code is often a combination of package-type-things (functions, tests, other software flow) and analyses. It’s unclear what the best approach to this sort of flow is, where we absolutely want functions, but they are very specific to the analyses, of which there are many. Do the analyses go in the package? In two projects, but that’s a hassle? Anyway, that’s a topic for a longer post.
Here, I have a self-contained, broadly usable bit of code I’m working on to extract information from the Victoria (Australia) waterdata network API. It’s more interesting than a Hello World type package, but also constrained in scope and the analyses can clearly go elsewhere.
This doc will be developed as I go, and so like most docs on this site isn’t a tutorial per se, but a sequence of steps, including pitfalls and recoveries (hopefully).
Getting started
First, created a repo in git.
For the main package development, I’m largely going to follow https://r-pkgs.org/, though I’m hoping I don’t have to read the whole thing (I know I should, but time is time).
Opened a new Rstudio session (I use renv, but want to adjust some things globally- particularly {devtools}).
install_packages("devtools"), then devtools::dev_sitrep() and install any requested updates (in my case, {roxygen2} was out of date.
• devtools or its dependencies out of date:
'cpp11'
Update them with `devtools::update_packages("devtools")`
── dev package ─────────────────────────────────────────────────────────────────
• package: 'galenR'
• path: 'C:/Users/galen/Documents/code/web_testing/galen_website/'
R is also out of date (at the time of writing). Fix it withrig, then re-run and update the packages.
devtools::update_packages('devtools')
Check the name I used works.
# browse = FALSE to avoid loading browsers rendering this websiteavailable::available('hydrogauge', browse =FALSE)
── hydrogauge ──────────────────────────────────────────────────────────────────
Name valid: ✔
Available on CRAN: ✔
Available on Bioconductor: ✔
Available on GitHub: ✔
Abbreviations: http://www.abbreviations.com/hydrogauge
Wikipedia: https://en.wikipedia.org/wiki/hydrogauge
Wiktionary: https://en.wiktionary.org/wiki/hydrogauge
Sentiment:???
Looks good.
Question- I typically use Rprojects and renv to manage dependencies and sandbox projects. I also know that I can just devtools::create() (which I think just wraps usethis::create_package(). Can I start with the Rproj and then turn it into a package? Should I want to?
Answer- I just needed to read a bit further. Rstudio has devtools and Rprojects working together. So calling usethis::create_package() builds the project and puts all the scaffolding where it needs to be. I’ll need to cross the existing complex Rproj –> package bridge with another project later, but this is fairly straightforward here.
And that worked with an existing directory. Was kind of worried about that. And it auto-opens a new Rstudio session.
Now I’m mostly moving over there, but I ran usethis::use_mit_license() to set the license. Looks like description and namespace need work, but do that later.
Let’s start building.
Building
I’ve been testing and poking at the API in some qmds here. I expect a lot of that ends up as vignettes in the package, and some is ready to become functions. I’ll likely maintain that flow- test in the qmd, make into functions there, repeat.
I’m going to go write a function, and then figure out how to use it.
Switching to the native pipe |> to see how it goes and reduce dependencies.
Package management
For dependencies, I used usethis::use_package(), which installs and auto-populates the DESCRIPTION file. But I think I’m going to try using renv in here too, so I don’t always overwrite system-wide libraries. Hope it doesn’t screw anything up. Usual renv::init().
packages that are nice to have (e.g. to allow parallelisation) are usethis::use_package('packagename', type = "suggests"). And if we want to import a function and not use package::function, use_import_from()- see below for the %dopar%.
Trying it out
So, I think usually the thing to do is run devtools::load_all()within the package project. I’m sure I’ll end up doing that. But it is also be possible to run it here, just passing the path, e.g. devtools::load_all("path/to/package/dir"). That lets me work on test and development qmds and scripts here. For a bit. But why? For one, seems like vignettes have to use rmd at least at present. And it keeps all the trial and error out of that repo.
I got hung up here for a while trying to pre-figure out how I’d install it once it was on github. Turns out it’s super straightforward (see below). It ends up just working as long as the thing on github has basic package structure.
Documentation
I’m using roxygen comments, as in the package dev book and roxygen docs for things like inheriting parameters and sections. Running devtools::document() builds the .rd files and means ?function works. There’s a lot of fancy stuff we could do there, but keeping it simple at first.
Vignettes
I like having actual demonstrations of the code, rather than just function docs, so using usethis::use_vignette to start building some. They have to be in rmd, not qmd. But the visual editor still works, which is nice. Just going to have to re-remember rmd chunk headers.
I can’t get df_print: paged to work. I think it might be a difference between html and html_vignette, but it is listed as an option in the help. For now using kable even though it’s huge for tables.
I ended up using the main vignette as an example in the primary github readme. To do that, I did usethis::use_readme_rmd(). Would be good to sort out {pkgdown}, or maybe there’s a streamlined quarto version that builds a website?
Testing
Using usethis::use_testthat(3) and writing tests was fairly straighforward, but I think there will be a learning curve about what and how to test. I tend to look very granularly at ad-hoc tests, i.e. scanning for weird NA, types, etc. But testthat and the expect_* functions lend themselves to simpler checks.
It gets sort of cumbersome if a function takes a while and generates something complex. In that case, I built tests that run the function (and so are fragile to the function just erroring out), and then run multiple different expect_* tests against it to make sure the output is right. As an example,
test_that("derived variables work for ts", { s3 <-get_response("https://data.water.vic.gov.au/cgi/webservice.exe?",paramlist =list("function"='get_ts_traces',"version"="2","params"=list("site_list"='233217',"start_time"=20200101,"varfrom"="100","varto"="140","interval"="day","datasource"="A","end_time"=20200105,"data_type"="mean","multiplier"=1)))expect_equal(class(s3), 'list')expect_equal(s3[[1]], 0)})
And then, if I want to hit the function with edge cases, etc, I have to do that over and over. There’s likely a better way, but I’ll need to experiment.
Issues
Trying to use %dopar%, but can’t get foreach::%dopar% to work, or with backticks. Putting it in a roxygen comment as @importFrom foreach %dopar% failed too. Seems to have worked to do usethis::use_import_from('foreach', '%dopar%'), which built some new files.
Having a hard time testing with doFuture, since it can’t find this package. pause that for a while
Installing
Once it’s pushed to github, it’s fairly straightforward to install- just
devtools::install_github("galenholt/hydrogauge")
Checking, building, etc
It ended up being pretty straightforward to use devtools::check() and using continuous integration with github to run the checks and put the little badges on, as described in the book.
It is easy to end up with funny missing pieces and issues if you forget to run devtools::check() and just push to github followed by devtools::install_github or even more likely if you just devtools::install_local from the directory with the code in it. In general, I think the github actions should take care of the check, but I never seem to get the emails that say it’s happened.
I often forget these steps. But to actually make the package usable other than with load_all(), we seem to need to devtools::check() and if there’s an rmd readme, knit that.
The readme ends up being hard to build when it gets updated without reinstalling the package. I ended up in a weird loop once where I couldn’t build the package with a broken readme, but couldn’t update the readme without package updates. The solution is devtools::build_readme() to install a temp package and build the readme from that, and then devtools::check().
Ephemera
renv
I use {renv} for package management and reproducibility, which usually (in a non-package Rproject) puts symlinks to the package in a projdir/renv/library/R-4.x/CPUtype/ directory. But interestingly, in a package project, it puts the symlinked library/R4.x/... in a central location (in my case, ~/AppData/Local/R/cache/R/renv/library/PACKAGENAME-HASH/R4.x/….
testing figures
I need to make some standard figure functions as part of the package. To test them, I’ve found the {vdiffr} package. It saves a figure if one doesn’t exist and if one does exist, it checks against the saved version. It seems to work well, the only trick is to remember to usethis::use_package('vdiffr', 'Suggests'), or it won’t be available to use by devtools::check().
‘global’ variables and dplyr
When I devtools::check() on a package using dplyr, I get a million errors about ‘no visible binding for global variable ’variable_name’’. The issue is that R CMD CHECK is interpreting the bare variable names in mutate, summarise, etc as variables and can’t find them. The code runs fine, but it’s annoying.
The answer, unfortunately, is to use the .data[['variable_name']] or .data$variable_name convention everywhere and usethis::use_import_from('rlang', '.data'). That works to get rid of the errors, but now we’ve lost one of the really nice things about writing dplyr code- the simplicity of bare data variable names.
Data
Data in /data needs to be included with the package (unsurprisingly- it obviously can’t be accessed if it’s not there). This is easy to have bite though, since my typical default is to gitignore data. Then everything works locally (where the package is originally built and data created), but fails when, for example, we install from github. The obvious answer is to not gitignore data (and so don’t change it very often). We do not seem to need to export the data like we do functions to have access to it in an installed package, (though documenting data is clearly ideal).
Using code coverage
Setting up badges and github actions sounds relatively straightforward, but there are tricks that either aren’t well-documented or I couldn’t figure out at all.
For example, if we set up automated coverage and checking,
Getting the coverage to work was really unclear to me. After cobbling it together, what seems to work is go to the codecov website and sign up for an account. Link it to your github, and then choose the repo you want and click ‘setup’. That tells you to put in a “repository secret”. Do that. It also says to ‘add codecov to your github actions’, which I didn’t do and it seems to work, I think because the use_github_action has already set that bit up (though it doesn’t have the particular text they say to add). That action seems to only run on master or main, so if nothing happens on a branch, try pushing there.
Making a package from an existing project
I have a lot of projects that I want to make into packages, largely to be able to have cleaner testing structure and be able to use devtools::load_all() to access functions instead of source in the head of everything. Pretty simple, just call
after you double check the working directory is right (should be, if you’re already in a project), and the check_name = FALSE because we likely don’t care if it’s a name that makes CRAN happy (if we do, obviously set that to TRUE).
This did cause a series of cascading issues that should have been expected. First, since {renv} puts the library in a different place for packages vs projects, it wanted to reinstall all packages. That took a while because I hadn’t been keeping up. Then it took a few restarts of Rstudio to get it to register the structure and make things like shortcuts to devtools::load_all() available.
It sure is nice to be able to load functions in though.