reticulate, virtualenv, and Python in Linux
Posted CADSEA
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了reticulate, virtualenv, and Python in Linux相关的知识,希望对你有一定的参考价值。
reticulate
is an R package that allows us to use Python modules from within RStudio. I recently found this functionality useful while trying to compare the results of different uplift models. Though I did have R’s uplift
package producing Qini charts and metrics, I also wanted to see how things looked with Wayfair’s promising pylift
package. Since pylift
is only available in python, reticulate
made it easy for me to quickly use pylift
from within RStudio.
In the article below, I’ll show how I worked through the following circumstances:
Since pylift has only been tested on Python >= 3.6, and my system version of Python was 2.7, I needed to build and install Python 3.6 for myself, preferably within a self-contained virtual environment.
I wanted to install pylift in the virtual environment and set up reticulate in my R Project to work within that environment.
Finally, I needed to access pylift from an R Markdown document via the reticulate interface.
Setting up Python, virtualenv, and RStudio
Note: for consistency, I always use an instance created via r-studio-instance and a base project from r-studio-project.
Python 2.7 is the default on the systems I use (CentOS 6/7). Since I did not want to modify the system-level Python version, I installed Python 3.6.x at the user level in $HOME/opt
and created a virtual environment using Python 3. I then activated the Python 3 environment and installed pylift
. Finally, I ensured RStudio-Server 1.2 was installed, as it has advanced reticulate
support like plotting graphs in line in R Markdown documents.
Below is a brief script that accomplishes the tasks in bash on CentOS 7:
cd ~ mkdir tmp cd tmp wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tgztar -xzvf Python-3.6.2.tgz cd Python-3.6.2 ./configure --prefix=$HOME/opt/python-3.6.2 --enable-shared make make install cd ~ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/opt/python-3.6.2/lib virtualenv -p $HOME/opt/python-3.6.2/bin/python3 pylift source pylift/bin/activate cd pylift git clone https://github.com/wayfair/pyliftcd pylift pip install . pip install -r requirements.txt cd wget https://s3.amazonaws.com/rstudio-ide-build/server/centos6/x86_64/rstudio-server-rhel-1.2.1335-x86_64.rpmsudo yum install -y --nogpgcheck rstudio-server-rhel-1.2.1335-x86_64.rpm sudo rstudio-server start
Some notes:
the
--enable-shared
option is required when building Python in order forreticulate
to workthe
LD_LIBRARY_PATH
library also needs to be set prior to creating the virtual environmentwe use virtualenv to create a virtual environment called “pylift” and then ensure that all Python packages are installed to that environment only (so as not to pollute any other environments we are working with)
we then clone the
pylift
source and installpylift
along with all of its requirements viapip install -r requirements.txt
finally, we install the RStudio Server 1.2 Preview version in order to leverage its advanced
reticulate
features
Using Python from within RStudio via reticulate
Switching from bash to RStudio, we load reticulate
and set it up to use the virtual environment we just created. Finally, and specific to pylift
, we set matplotlib parameters so that we can plot directly in R.
library(reticulate)
Sys.setenv(LD_LIBRARY_PATH = paste0(Sys.getenv("HOME"),"/opt/python-3.6.2/lib"))
Sys.getenv("LD_LIBRARY_PATH")
use_virtualenv("/home/rstevenson/pylift", required=TRUE)
py_config()
# Currently this must be run in order for R-markdown plotting to work
matplotlib <- import("matplotlib")
matplotlib$use("Agg", force = TRUE)
Test that it works
The following replicates the first part of pylift tutorial: simulated data
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2*np.pi*t)
plt.plot(t,s)
When run, the above code chunk should display a sinusoidal graph below it.
import numpy as np, matplotlib as mpl, matplotlib.pyplot as plt, pandas as pd
from pylift import TransformedOutcome
from pylift.generate_data import dgp
# Generate some data.
df = dgp(N=10000, discrete_outcome=True)
# Specify your dataframe, treatment column, and outcome column.
up = TransformedOutcome(df, col_treatment='Treatment', col_outcome='Outcome', stratify=df['Treatment'])
# This function randomly shuffles your training data set and calculates net information value.
up.NIV()
The above Python chunk uses reticulate
from within RStudio to interact with pylift
in the context of a custom virtual environment, using a custom version of Python. This degree of customization and functionality should be useful to users who:
want to use a different Python version than they typically use while not affecting their typical setup by way of a virtual environment
want to install a Python module like
pylift
within a virtual environment so as not to affect any of their user- or system-level Python module installationswant to use
reticulate
from RStudio to access a custom virtual environment, Python version, and Python moduleswants to be able to delete the virtual environment and R-Project and have everything go back to the way it was
wants to be able to reproduce or share the environment exactly so that the workflow can be shared with others
以上是关于reticulate, virtualenv, and Python in Linux的主要内容,如果未能解决你的问题,请参考以下文章