reticulate, virtualenv, and Python in Linux

Posted CADSEA

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了reticulate, virtualenv, and Python in Linux相关的知识,希望对你有一定的参考价值。

reticulate is an R package that allows us to use Python modules from within RStudio. I recently found this functionality useful while trying to compare the results of different uplift models. Though I did have R’s uplift package producing Qini charts and metrics, I also wanted to see how things looked with Wayfair’s promising pylift package. Since pylift is only available in python, reticulatemade it easy for me to quickly use pylift from within RStudio.

In the article below, I’ll show how I worked through the following circumstances:

  • Since pylift has only been tested on Python >= 3.6, and my system version of Python was 2.7, I needed to build and install Python 3.6 for myself, preferably within a self-contained virtual environment.

  • I wanted to install pylift in the virtual environment and set up reticulate in my R Project to work within that environment.

  • Finally, I needed to access pylift from an R Markdown document via the reticulate interface.

Setting up Python, virtualenv, and RStudio

Note: for consistency, I always use an instance created via r-studio-instance and a base project from r-studio-project.

Python 2.7 is the default on the systems I use (CentOS 6/7). Since I did not want to modify the system-level Python version, I installed Python 3.6.x at the user level in $HOME/opt and created a virtual environment using Python 3. I then activated the Python 3 environment and installed pylift. Finally, I ensured RStudio-Server 1.2 was installed, as it has advanced reticulate support like plotting graphs in line in R Markdown documents.

Below is a brief script that accomplishes the tasks in bash on CentOS 7:

cd ~ mkdir tmp cd tmp wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tgztar -xzvf Python-3.6.2.tgz cd Python-3.6.2 ./configure --prefix=$HOME/opt/python-3.6.2 --enable-shared make make install cd ~ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/opt/python-3.6.2/lib virtualenv -p $HOME/opt/python-3.6.2/bin/python3 pylift source pylift/bin/activate cd pylift git clone https://github.com/wayfair/pyliftcd pylift pip install . pip install -r requirements.txt cd wget https://s3.amazonaws.com/rstudio-ide-build/server/centos6/x86_64/rstudio-server-rhel-1.2.1335-x86_64.rpmsudo yum install -y --nogpgcheck rstudio-server-rhel-1.2.1335-x86_64.rpm sudo rstudio-server start


Some notes:

  • the --enable-shared option is required when building Python in order for reticulate to work

  • the LD_LIBRARY_PATH library also needs to be set prior to creating the virtual environment

  • we use virtualenv to create a virtual environment called “pylift” and then ensure that all Python packages are installed to that environment only (so as not to pollute any other environments we are working with)

  • we then clone the pylift source and install pylift along with all of its requirements via pip install -r requirements.txt

  • finally, we install the RStudio Server 1.2 Preview version in order to leverage its advanced reticulate features

Using Python from within RStudio via reticulate

Switching from bash to RStudio, we load reticulate and set it up to use the virtual environment we just created. Finally, and specific to pylift, we set matplotlib parameters so that we can plot directly in R.

library(reticulate)

Sys.setenv(LD_LIBRARY_PATH = paste0(Sys.getenv("HOME"),"/opt/python-3.6.2/lib"))
Sys.getenv("LD_LIBRARY_PATH")
use_virtualenv("/home/rstevenson/pylift", required=TRUE)
py_config()

# Currently this must be run in order for R-markdown plotting to work
matplotlib <- import("matplotlib")
matplotlib$use("Agg", force = TRUE)

Test that it works

The following replicates the first part of pylift tutorial: simulated data

import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2*np.pi*t)
plt.plot(t,s)

When run, the above code chunk should display a sinusoidal graph below it.

import numpy as np, matplotlib as mpl, matplotlib.pyplot as plt, pandas as pd
from pylift import TransformedOutcome
from pylift.generate_data import dgp
# Generate some data.
df = dgp(N=10000, discrete_outcome=True)

# Specify your dataframe, treatment column, and outcome column.
up = TransformedOutcome(df, col_treatment='Treatment', col_outcome='Outcome', stratify=df['Treatment'])

# This function randomly shuffles your training data set and calculates net information value.
up.NIV()


The above Python chunk uses reticulate from within RStudio to interact with pylift in the context of a custom virtual environment, using a custom version of Python. This degree of customization and functionality should be useful to users who:

  • want to use a different Python version than they typically use while not affecting their typical setup by way of a virtual environment

  • want to install a Python module like pylift within a virtual environment so as not to affect any of their user- or system-level Python module installations

  • want to use reticulate from RStudio to access a custom virtual environment, Python version, and Python modules

  • wants to be able to delete the virtual environment and R-Project and have everything go back to the way it was

  • wants to be able to reproduce or share the environment exactly so that the workflow can be shared with others


以上是关于reticulate, virtualenv, and Python in Linux的主要内容,如果未能解决你的问题,请参考以下文章

pyomo + reticulate 错误6句柄无效

如何使用 R reticulate 安装 gekko 包?

在R中使用带有网格包的Python - 找不到Numpy

python之virtualenv

python三大神器之virtualenv

python virtualenv