为 Python 3.8.5 安装 NVIDIA Apex 并与 PyTorch 1.9 兼容

Posted

技术标签:

【中文标题】为 Python 3.8.5 安装 NVIDIA Apex 并与 PyTorch 1.9 兼容【英文标题】:installing NVIDIA Apex for Python 3.8.5 and compatible with PyTorch 1.9 【发布时间】:2021-11-09 05:13:24 【问题描述】:

我正在运行显然需要 NVIDIA apex 的代码(我最初不知道并安装了错误的 apex)。我不确定如何修复最终错误:

(proxy) [jalal@goku proxynca_pp]$ CUDA_VISIBLE_DEVICES=0,1 python train.py --dataset cub  --config config/cub.json --mode train --apex --seed 0
(1024, 4096)
train.py:12: MatplotlibDeprecationWarning: The 'warn' parameter of use() is deprecated since Matplotlib 3.1 and will be removed in 3.3.  If any parameter follows 'warn', they should be pass as keyword, not positionally.
  matplotlib.use('agg', warn=False, force=True)
Traceback (most recent call last):
  File "train.py", line 70, in <module>
    from apex import amp
  File "/scratch3/venv/proxy/lib/python3.8/site-packages/apex/__init__.py", line 13, in <module>
    from pyramid.session import UnencryptedCookieSessionFactoryConfig
ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

收到上述错误后,我尝试了这个答案:https://***.com/a/67188946/2414957

(proxy) [jalal@goku proxynca_pp]$ pip uninstall apex
Found existing installation: apex 0.9.10.dev0
Uninstalling apex-0.9.10.dev0:
  Would remove:
    /scratch3/venv/proxy/lib/python3.8/site-packages/apex-0.9.10.dev0-py3.8.egg-info
    /scratch3/venv/proxy/lib/python3.8/site-packages/apex/*
Proceed (Y/n)? y
  Successfully uninstalled apex-0.9.10.dev0
(proxy) [jalal@goku proxynca_pp]$ git clone https://github.com/NVIDIA/apex
Cloning into 'apex'...
remote: Enumerating objects: 8256, done.
remote: Counting objects: 100% (343/343), done.
remote: Compressing objects: 100% (192/192), done.
remote: Total 8256 (delta 204), reused 240 (delta 139), pack-reused 7913
Receiving objects: 100% (8256/8256), 14.20 MiB | 0 bytes/s, done.
Resolving deltas: 100% (5605/5605), done.
(proxy) [jalal@goku proxynca_pp]$ cd apex
(proxy) [jalal@goku apex]$ pip install -v --disable-pip-version-check --no-cache-dir \
> --global-option="--cpp_ext" --global-option="--cuda_ext" ./
/scratch3/venv/proxy/lib/python3.8/site-packages/pip/_internal/commands/install.py:229: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
  cmdoptions.check_install_build_global(options)
Using pip 21.2.4 from /scratch3/venv/proxy/lib/python3.8/site-packages/pip (python 3.8)
Processing /scratch3/research/code/fashion/proxynca_pp/apex
  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
    Running command python setup.py egg_info


    torch.__version__  = 1.9.0+cu111


    running egg_info
    creating /scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info
    writing /scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/PKG-INFO
    writing dependency_links to /scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/dependency_links.txt
    writing top-level names to /scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/top_level.txt
    writing manifest file '/scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/SOURCES.txt'
    reading manifest file '/scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/SOURCES.txt'
    writing manifest file '/scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/SOURCES.txt'
    /scratch/tmp/pip-req-build-fg_khhkt/setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
      warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
Skipping wheel build for apex, due to binaries being disabled for it.
Installing collected packages: apex
    Running command /scratch3/venv/proxy/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/tmp/pip-req-build-fg_khhkt/setup.py'"'"'; __file__='"'"'/scratch/tmp/pip-req-build-fg_khhkt/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /scratch/tmp/pip-record-u812zb2v/install-record.txt --single-version-externally-managed --compile --install-headers /scratch3/venv/proxy/include/site/python3.8/apex


    torch.__version__  = 1.9.0+cu111


    /scratch/tmp/pip-req-build-fg_khhkt/setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
      warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

    Compiling cuda extensions with
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130
    from /usr/local/cuda-10.0/bin

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/scratch/tmp/pip-req-build-fg_khhkt/setup.py", line 159, in <module>
        check_cuda_torch_binary_vs_bare_metal(CUDA_HOME)
      File "/scratch/tmp/pip-req-build-fg_khhkt/setup.py", line 99, in check_cuda_torch_binary_vs_bare_metal
        raise RuntimeError("Cuda extensions are being compiled with a version of Cuda that does " +
    RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 11.1.
    In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).
    Running setup.py install for apex ... error
ERROR: Command errored out with exit status 1: /scratch3/venv/proxy/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/tmp/pip-req-build-fg_khhkt/setup.py'"'"'; __file__='"'"'/scratch/tmp/pip-req-build-fg_khhkt/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /scratch/tmp/pip-record-u812zb2v/install-record.txt --single-version-externally-managed --compile --install-headers /scratch3/venv/proxy/include/site/python3.8/apex Check the logs for full command output.

我已经安装了这些软件包:

(proxy) [jalal@goku apex]$ pip freeze
anykeystore==0.2
certifi==2021.5.30
charset-normalizer==2.0.4
cryptacular==1.6.2
cycler==0.10.0
defusedxml==0.7.1
greenlet==1.1.1
h5py==3.4.0
hupper==1.10.3
idna==3.2
joblib==1.0.1
kiwisolver==1.3.2
MarkupSafe==2.0.1
matplotlib==3.2.0
numpy==1.21.2
oauthlib==3.1.1
PasteDeploy==2.1.1
pbkdf2==1.3
Pillow==8.3.2
plaster==1.0
plaster-pastedeploy==0.7
pyparsing==2.4.7
pyramid==2.0
pyramid-mailer==0.15.1
python-dateutil==2.8.2
python3-openid==3.2.0
repoze.sendmail==4.4.1
requests==2.26.0
requests-oauthlib==1.3.0
scikit-learn==0.24.2
scipy==1.7.1
six==1.16.0
sklearn==0.0
SQLAlchemy==1.4.23
threadpoolctl==2.2.0
torch==1.9.0+cu111
torchaudio==0.9.0
torchvision==0.10.0+cu111
tqdm==4.62.2
transaction==3.0.1
translationstring==1.4
typing-extensions==3.10.0.2
urllib3==1.26.6
velruse==1.1.1
venusian==3.0.0
WebOb==1.8.7
WTForms==2.3.3
wtforms-recaptcha==0.3.2
zope.deprecation==4.4.0
zope.interface==5.4.0
zope.sqlalchemy==1.6

这里的代码来自this GitHub repo。

编辑:我通过我现在找不到的 *** 答案找到了步骤(上面链接)。我不知道如何找到与 PyTorch 1.9 兼容的正确链接或安装。

仅供参考,git repo 没有安装说明,因此我在盲目安装。

【问题讨论】:

如果您的问题真的是“我不知道在哪里可以找到 PyTorch 1.9 的兼容版本”,那么为什么问题中没有所有这些细节? 【参考方案1】:

安装CUDA 11.1,然后将以下内容添加到~/.bashrc 并获取~/.bashrc,最后符号链接使其工作:

export CUDA_HOME=/usr/local/cuda-11.1
export PATH=/usr/local/cuda-11.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64:$LD_LIBRARY_PATH

这消除了卸载 CUDA 10.2 的需要,尤其是在以后其他项目需要时。简单地导出路径而不使用符号链接是行不通的。 $ sudo ln -sfT /usr/local/cuda/cuda-11.1/ /usr/local/cuda

^ 最后一个命令假设您的机器上安装了多个 CUDA 版本。 更多信息请阅读this GitHub issue。

【讨论】:

【参考方案2】:

您的 cuda 版本似乎是 v10,而您的 pytorch 是基于 v11.1 构建的。 Apex 可能在抱怨它。

来自错误:

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
from /usr/local/cuda-10.0/bin

RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. 
Pytorch binaries were compiled with Cuda 11.1.

您能否尝试确保两个版本相同。

    如果您安装了 CUDA 11.1,请导出其路径 export CUDA_HOME=/usr/local/cuda-11.1/ 否则,使用 CUDA 10 安装 pytorch。 最后一个选项是您只需删除minor version check,例如你已经安装了 CUDA 10.0,但是 pytorch 是 10.2。

setup.py:

if (bare_metal_major != torch_binary_major) #or (bare_metal_minor != torch_binary_minor):

【讨论】:

在 google colab 上,我必须同时关注 #2 和 #3,因为那里安装了 CUDA-10.0 和 10.1,而不是 10.2。 (pytorch 10.2)和export CUDA_HOME=/usr/local/cuda-10.0/

以上是关于为 Python 3.8.5 安装 NVIDIA Apex 并与 PyTorch 1.9 兼容的主要内容,如果未能解决你的问题,请参考以下文章

无法使用 pyenv 安装 Python 3.8.5

当安装程序不起作用时,如何正确卸载/修复 python 3.8.5?

opencv系列之基于NVIDIA显卡的opencv-python硬解方案

opencv系列之基于NVIDIA显卡的opencv-python硬解方案

我尝试使用 pip 安装 win32gui,但出现此错误。我正在使用 python 3.8.5,我也有最新版本的 pip

nvidia gpu的环境配置