Using R and python together

```{r setup}
#| warning: false
#| message: false

knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
```

The issue

I have a project primarily in R, but needs some python. For the big python work, I’ll have a directory with a poetry environment and python code. But I’ve run into the issue that I want to run just one or two lines of python from R. The specific case is that I have python code for extracting river gauge data, and I’ve filtered some river gauges in R for something else, and rather than do the finding of the gauges again in python, I’d rather just do the extraction in R. I think that means I have to sort out {reticulate}, but also how to point reticulate at my python environment. The situation I have is a poetry project inside a directory with an Rproj (which probably needs to be split up, but it’s what I have now).

My python_setup sets up a very similar situation, so let’s see if I can use it.

I’m going to try to remember to always put

execute:
  echo: fenced

in the yaml headers of mixed r-python notebooks, so it’s clear which chunks are which language (though we can usually tell by the code).

Additional issues

This all worked, but then stopped- python chunks separated by R chunks couldn’t share objects. This particular notebook still ran, but others would run fine interactively but when rendered would throw errors about “name ‘pyobjname’ is not defined”. I tried making sure jupyter was in the python env and set engine: knitr in yaml headers, since that’s what the help suggested. And I set the QUARTO_PYTHON environment variable in an _environments file, since that helped me previously. It’s unclear why it worked previously.

IDE NOTE The above may be due in part because Rstudio uses the reticulate python REPL for python chunks, and so they are available to R. Quarto itself does as well. But VScode does not, and so python and R chunks are run wholly independently of each other and we can’t just pass objects between them as we do here in interactive notebooks. Instead, we need to run the python through reticulate in R chunks., rather than interactive python in the REPL. This is usually fine, but becomes a pain when we want to stay in python for a while, e.g to run something, process it, run something else, and then get it back in R.

Set up reticulate from R

Point reticulate at the venv. See stackoverflow. This seems to not be necessary if the .venv is in the outer project directory. Or if we’ve set the RETICULATE_PYTHON environment variable elsewhere (like .Rprofile).

``` {{r}}
reticulate::use_virtualenv(file.path('RpyEnvs', 'pytesting', 'Scripts', '.venv'), required = TRUE)
```

If this is more than a one-off and you’re using an R project, it’s usually better to set the RETICULATE_PYTHON environment variable in .Rprofile. Here, that means adding this line in .Rprofile . This has the added bonus of stopping Rstudio/R throwing warnings about conda at startup when it detects {reticulate} being used in the project.

``` {{r}}
Sys.setenv(RETICULATE_PYTHON = file.path('RpyEnvs', 'pytesting', '.venv'))
```

See Quarto notes for some similar issues for different python-related env variables.

Load the library. Interestingly, the python code chunks will run without loading the library, but I can’t access their values using py$pythonobject unless I load it.

```{r}
library(reticulate)
```

R

First, let’s create some things in R.

```{r}
a <- 1
b <- 2
```

Python

Does not just inherit the values from R, but runs.

```{python}
a = 1
b = 2
a+b
```
3

Do I have access to packages? Yes.

```{python}
import numpy as np

x = np.arange(15, dtype=np.int64).reshape(3, 5)
x[1:, ::2] = -99
x
```
array([[  0,   1,   2,   3,   4],
       [-99,   6, -99,   8, -99],
       [-99,  11, -99,  13, -99]], dtype=int64)

Does access to python objects persist? Yes

Though in lot of other docs, this has proved to be super unstable, and fails intermittently

```{python}
x.max(axis=1)
```
array([ 4,  8, 13], dtype=int64)

Moving data back and forth

Python to R

Can I access objects with R? Yes, but not quite directly. Have to use the py$pythonObject notation. But only if I’ve loaded library(reticulate) or specified with reticulate::py. That’s a pain, so probably almost always better to load the library. Even though the python chunks run fine without explictly loading it, I can’t seem to access py without loading it.

```{r}
# x
reticulate::py$x
```
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    1    2    3    4
[2,]  -99    6  -99    8  -99
[3,]  -99   11  -99   13  -99

R to python

Similar to python objects being in py, R objects are in r, and are accessed with . instead of $.

```{r}
c <- 17
```

Interestingly, the r. notation to get R into python does not need reticulate:: on it. Which I guess makes some sense- this block is actually running in python and python doesn’t know what reticulate is. But it does know what r. is, somehow. Pretty cool.

```{python}
r.c + b
```
19.0

Python-R-python and NameErrors

Elsewhere I’m running into a new issue of getting “NameError: name ‘pyobjectname’ is not defined” when I try to access an object defined in a previous python chunk in a later python chunk. It seems to be worse when there’s an R chunk in the middle. It doesn’t seem to be happening here, since the preceding chunk could get at b, which was defined way earlier.

Does the issue happen when R has touched it? YES. This all runs fine interactively, but when I try to render, I get “Error in py_call_impl(callable, dots$args, dots$keywords) : NameError: name ‘b’ is not defined” in the python chunk.

```{python}
b + a
```
3
```{r}
rb <- py$b + c
rb
```
[1] 19
```{python}
b + a
```
3

Python in R

In some cases, we want to run python through reticulate, but not in the REPL (or are forced to by e.g. the way VScode handles python and R chunks). To do this, we use an R chunk and reticulate functions. Re-doing the numpy example above, now we’re working in R, and reticulate is doing the python behind the scenes. Reticulate has some special functions for numpy, but I’m trying to keep this generic- it’s about the process, not about numpy per se. This immediately becomes a pain when we need to chain methods- e.g. the reshape here fails.

```{r}
#| error: true
np <- reticulate::import('numpy')
x <- np$arange(15)$reshape(3, 5)
```
Error in np$arange(15)$reshape: $ operator is invalid for atomic vectors
```{r}
#| error: true
x
```
Error in eval(expr, envir, enclos): object 'x' not found