Ubuntu16.04 安装TensorFlow-GPU

Posted 2021-01-06

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Ubuntu16.04 安装TensorFlow-GPU相关的知识，希望对你有一定的参考价值。

系统：ubuntu 16.04.5 desktop （server版第1和2部应该不用操作）
显卡：NVIDIA GeForce GTX 1080 Ti
官方文档：https://www.tensorflow.org/install/install_linux

1 修改Ubuntu的默认启动级别为3

1.1 查看系统目前运行级别

[email protected]:~$ runlevel 
N 5

1.2 修改运行级别为3

编辑/etc/default/grub文件：

[email protected]:~$ sudo vi /etc/default/grub
    将GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"的一行注释掉：
    # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

    将GRUB_CMDLINE_LINUX=""的一行修改为：
    GRUB_CMDLINE_LINUX="text"

    去掉#GRUB_TERMINAL=console一行的注释，即修改为：
    GRUB_TERMINAL=console

[email protected]:~$ sudo update-grub

[email protected]:~$ sudo systemctl set-default multi-user.target

重启系统：

[email protected]:~$ reboot

1.3 验证

[email protected]:~$ runlevel 
N 3
[email protected]:~$

1.4 命令行模式和图形界面模式的切换

命令行 --> 图形界面：

现在如果想进入图形用户界面（仅进入一次，重启系统后仍然会进入命令行模式），可执行如下命令：

[email protected]:~$ sudo systemctl start lightdm

如果想设置为系统启动后默认进入图形用户界面，执行如下命令：

[email protected]:~$ sudo systemctl set-default graphical.target

然后执行reboot命令重启系统即可。
[email protected]:~$ sudo reboot

图形界面 --> 命令行：

设置为系统启动后默认进入命令行，执行如下命令：
[email protected]:~$ sudo systemctl set-default multi-user.target

然后执行reboot命令重启系统即可。
[email protected]:~$ sudo reboot

2 禁用Ubuntu自带显卡驱动（重要）

方法一：

想要用GPU版的MxNet必须用NVIDIA的GPU，如果没有禁用Ubuntu自带的显卡驱动，更新Nvdia的驱动，就会出现如X server is running或者不停的提示你重启，或者即使你安装成功了，也没办连接驱动等各种问题。

桌面版的Ubuntu，就有一个最简单的方式。在软件更新里，有额外驱动这一选项，系统会自动检测并匹配NVIDIA的显卡驱动，只要选中安装即可。就这么简单！

方法二：

删除Nouveau内核驱动程序（修复Nvidia安装错误）
参考：https://tutorials.technology/tutorials/85-How-to-remove-Nouveau-kernel-driver-Nvidia-install-error.html
介绍
警告本教程可能会破坏您的系统，请确保在执行这些步骤之前备份系统。

如果当前正在使用Nouveau内核驱动程序，则安装Offial nvidia驱动程序将返回错误。我们将解释如何修复错误并安装官方驱动程序。

ERROR: The Nouveau kernel driver is currently in use by your system.  This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.  Please consult the NVIDIA driver README and
your Linux distribution‘s documentation for details on how to correctly disable the Nouveau kernel driver.

2.1 清理所有nvidia包

在此步骤中，我们将删除所有与nvidia相关的包。

[email protected]:~$ sudo apt-get remove nvidia* && sudo apt autoremove

如果您收到以下错误，则表示您从未安装过nvidia软件包并且没问题：

no matches found: nvidia*

现在安装一些必需的依赖项：

[email protected]:~$ sudo apt-get install dkms build-essential linux-headers-generic

2.2 黑名单nouveau驱动程序

现在阻止并禁用nouveau内核驱动程序：

[email protected]:~$ sudo vim /etc/modprobe.d/blacklist.conf
#添加

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

2.3 更新initramfs

键入以下命令禁用内核nouveau：

[email protected]:~$ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf build the new kernel by:

最后更新并重启：

[email protected]:~$ sudo update-initramfs -u
[email protected]:~$ reboot

3 安装Nvidia cuda_9驱动

3.1 安装依赖包libGLU.so + libX11.so + libXi.so + libXmu.so

[email protected]:~$ sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

3.2 下载Nvidia cuda_9.2驱动

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

3.3 安装Nvidia cuda_9.2驱动

[email protected]:/data/tools$ sudo sh cuda_9.2.148_396.37_linux.run.37_linux
......
......
Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37?
(y)es/(n)o/(q)uit: y

Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: 

Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: 

Install the CUDA 9.2 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-9.2 ]: 

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.2 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /home/user ]: 

Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.2 ...
Installing the CUDA Samples in /home/user ...
Copying samples to /home/user/NVIDIA_CUDA-9.2_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-9.2
Samples:  Installed in /home/user

Please make sure that
 -   PATH includes /usr/local/cuda-9.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.2/lib64, or, add /usr/local/cuda-9.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.2/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_18869.log
[email protected]:/data/tools$

3.4 添加环境变量

[email protected]:~$ vim ~/.bashrc 
# add cuda
export PATH=${PATH}:/usr/local/cuda-9.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
[email protected]:~$ source ~/.bashrc

3.5 显示显卡信息

[email protected]:/data/tools$ nvidia-smi
Fri Sep 14 15:09:33 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   41C    P5    37W / 300W |      0MiB / 11176MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[email protected]:/data/tools$

4 安装NVIDIA cuDNN

GPU加速深度学习
安装cudnn前先要确保cuda和NVIDIA驱动已正确安装

4.1 下载(需要注册登录NVIDIA账户)

https://developer.nvidia.com/cudnn
选择系统以及cuda对应的cudnn版本

4.2 deb安装cuDNN

[email protected]:/data/tools$ ll
总用量 1952872
drwxr-xr-x 3 user user        269 9月  14 13:25 ./
drwxr-xr-x 3 user user         19 9月  14 10:21 ../
-rw-rw-r-- 1 user user 1757268179 9月  14 10:25 cuda_9.2.148_396.37_linux.run.37_linux
-rw-rw-r-- 1 user user  123377766 9月  14 13:25 libcudnn7_7.2.1.38-1+cuda9.2_amd64.deb
-rw-rw-r-- 1 user user  114154210 9月  14 10:22 libcudnn7-dev_7.2.1.38-1+cuda9.2_amd64.deb
-rw-rw-r-- 1 user user    4914818 9月  14 13:25 libcudnn7-doc_7.2.1.38-1+cuda9.2_amd64.deb

[email protected]:/data/tools$ sudo dpkg -i libcudnn7_7.2.1.38-1+cuda9.2_amd64.deb
正在选中未选择的软件包 libcudnn7。
(正在读取数据库 ... 系统当前共安装有 249019 个文件和目录。)
正准备解包 libcudnn7_7.2.1.38-1+cuda9.2_amd64.deb  ...
正在解包 libcudnn7 (7.2.1.38-1+cuda9.2) ...
正在设置 libcudnn7 (7.2.1.38-1+cuda9.2) ...
正在处理用于 libc-bin (2.23-0ubuntu10) 的触发器 ...

[email protected]:/data/tools$ sudo dpkg -i libcudnn7-dev_7.2.1.38-1+cuda9.2_amd64.deb
(正在读取数据库 ... 系统当前共安装有 249025 个文件和目录。)
正准备解包 libcudnn7-dev_7.2.1.38-1+cuda9.2_amd64.deb  ...
正在将 libcudnn7-dev (7.2.1.38-1+cuda9.2) 解包到 (7.2.1.38-1+cuda9.2) 上 ...
正在设置 libcudnn7-dev (7.2.1.38-1+cuda9.2) ...
update-alternatives: 使用 /usr/include/x86_64-linux-gnu/cudnn_v7.h 来在自动模式中提供 /usr/include/cudnn.h (libcudnn)

[email protected]:/data/tools$ sudo dpkg -i libcudnn7-doc_7.2.1.38-1+cuda9.2_amd64.deb
正在选中未选择的软件包 libcudnn7-doc。
(正在读取数据库 ... 系统当前共安装有 249025 个文件和目录。)
正准备解包 libcudnn7-doc_7.2.1.38-1+cuda9.2_amd64.deb  ...
正在解包 libcudnn7-doc (7.2.1.38-1+cuda9.2) ...
正在设置 libcudnn7-doc (7.2.1.38-1+cuda9.2) ...
[email protected]:/data/tools$

4.3 验证cudnn是否成功

[email protected]:/data/tools$ cp -r /usr/src/cudnn_samples_v7 $HOME
[email protected]:/data/tools$ cd $HOME/cudnn_samples_v7/mnistCUDNN
[email protected]:~/cudnn_samples_v7/mnistCUDNN$ make clean && make
rm -rf *o
rm -rf mnistCUDNN
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o  -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm

[email protected]:~/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN
cudnnGetVersion() : 7201 , CUDNN_VERSION from cudnn.h : 7201 (7.2.1)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 28  Capabilities 6.1, SmClock 1683.0 Mhz, MemSize (Mb) 11176, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.110592 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.110592 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.147328 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.327680 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.494592 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9×××88 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.104448 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.113632 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.151552 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.323584 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.495616 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

成功安装，会提示“Test passed!”信息

5 [可选] 安装 NVIDIA TensorRT 3.0

为了优化推理效果，您还可以安装 NVIDIA TensorRT 3.0。搭配预编译的 tensorflow-gpu 软件包使用所需的最小 TensorRT 运行时组件集合可按以下方法安装：

5.1 下载

[email protected]:~/cudnn_samples_v7/mnistCUDNN$ wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb

5.2 安装

[email protected]:~/cudnn_samples_v7/mnistCUDNN$ sudo dpkg -i nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb
正在选中未选择的软件包 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2。
(正在读取数据库 ... 系统当前共安装有 249144 个文件和目录。)
正准备解包 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb  ...
正在解包 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2 (1-1) ...
正在设置 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2 (1-1) ...

The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo apt-key add /var/nvinfer-runtime-trt-repo-4.0.1-ga-cuda9.2/7fa2af80.pub

[email protected]:/data/tools$ sudo dpkg -i nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb
(正在读取数据库 ... 系统当前共安装有 249154 个文件和目录。)
正准备解包 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb  ...
正在将 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2 (1-1) 解包到 (1-1) 上 ...
正在设置 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2 (1-1) ...

[email protected]:/data/tools$ sudo apt-get update

6 选择TensorFlow的安装方式

tensorflow-gpu
＃要激活此环境，请使用：
＃> source activate tensorflow
或
＃> source activate tensorflow-gpu
＃
＃要停用活动环境，请使用：
＃> source deactivate

参考

https://blog.csdn.net/Jonms/article/details/79318566

以上是关于Ubuntu16.04 安装TensorFlow-GPU的主要内容，如果未能解决你的问题，请参考以下文章