No visible binding for global variable

No visible binding issue

When we check packages (usually with devtools::check(), but presumably R CMD CHECK as well, we get a lot of notes about “no visible binding for global variable” if we use tidyverse code. This is because of the data masking dplyr et al do to let us use bare names.

However, there’s a different fix for select (and tidyselect generally) than for other verbs like mutate and summarise. It’s hard to find, because we can fix all of the ‘global variable’ errors with .data, but that then causes deprecation warnings for select and friends while testing.

While styler does not find this problem, lintr does. So it’s very helpful to install lintr and lint files, rather than running a full check. It doesn’t pick up the issues with tidyselect deprecating .data though. At least test works on single files and picks that issue up.

Note- use devtools::load_all() before linting, or it won’t pick up .data itself and will throw that as a non-visible global.

data masking verbs

The answer for non-select verbs is to use the .data[['variable_name']] or .data$variable_name convention everywhere and usethis::use_import_from('rlang', '.data'). That works to get rid of the errors, but now we’ve lost one of the really nice things about writing dplyr code- the simplicity of bare data variable names.

Then, we need to actually find all of those places we need to add .data$ in front of names. The notes check produces help, and then we just have to look through the files. In general, it’s everything in tidyverse that uses a bare name, with a few exceptions. I put together a repo to minimally check which functions need it.

tidyselect

The tidyselect approach has deprecated .data after 1.2.0. It still checks fine, and gets rid of the global variable issue, but we get lots of warnings Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0. The reasoning is described on the tidyselect blog, where the suggested solution is to use all_of (or just to use characters). That seems OK, but is again clunky- we now need to stuff at minimum “” around the terms, and at worst, tidyselect::all_of all over the place. At some point it’s just easier to use [. Where it’s likely going to be most needed is where the tidyselect is a helper to other verbs, e.g. in the .by argument of mutate and summarise.

The example repo is set up to not throw checks or errors, and so we can see what works for different verbs.

foreach

The ‘no visible binding of global variable’ also shows up for foreach indices. The solution there is to initialise the variable first. From foreach github issues.

```{r}
## To please 'R CMD check'
x <- NULL

y <- foreach::foreach(x = 1:3) %do% {
  sqrt(x)
}
```

Cheat sheet

In general, the key is if the help says an argument uses ‘data masking’, use .data, and if it says it uses tidy-select, use tidyselect. The catch is, it can be annoying to check, and some functions aren’t very clear. A cheatsheet of what to use where is in ?@tbl-tidycheat. Also, as far as I can tell, anywhere where this says tidyselect, if we just have a set of variable names, we can use either tidyselect::all_of(c("v1", 'v2')) or just c('v1', 'v2') (or, obviously, more complex tidyselecting).

cheat_tibble |> 
  knitr::kable()
verb argument fix
mutate .data
mutate .by tidyselect
summarise .data
summarise .by tidyselect
group_by .data
across .cols tidyselect
select tidyselect
rename tidyselect
join_by character?
unnest cols tidyselect
unnest_longer col tidyselect
unnest_wider col tidyselect
case_when (in mutate) .data
filter .data
filter .by tidyselect
ggplot2::aes x,y,… .data
foreach index preassign NULL
tidyr::separate col tidyselect
pivot_wider id_cols tidyselect
pivot_wider names_from tidyselect
pivot_wider values_from tidyselect
distinct .data
DiagrammeR::add_nodes_from_table label_col tidyselect
DiagrammeR::add_nodes_from_table set_type tidyselect
DiagrammeR::add_nodes_from_table drop_cols tidyselect
DiagrammeR::add_nodes_from_table type_col tidyselect
arrange .data