Tensorflow 未在 jupyter notebook 的 GPU 上运行

Posted

技术标签:

【中文标题】Tensorflow 未在 jupyter notebook 的 GPU 上运行【英文标题】:Tensorflow not running on GPU in jupyter notebook 【发布时间】:2017-11-05 17:51:39 【问题描述】:

在 Ubuntu 上成功为 GTX 1080 ti 安装了 Cuda 和 cudnn,在 jupyter notebook 中运行一个简单的 TF 程序,在运行 tensorflow-gpu==1.0 vs tensorflow==1.0 的 conda 环境中速度不会增加。

当我运行 nvidia-smi 时:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 0000:01:00.0 On | N/A |
| 24% 45C P0 62W / 250W | 537MiB / 11171MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1101 G /usr/lib/xorg/Xorg 310MiB |
| 0 1877 G compiz 219MiB |
| 0 3184 G /usr/lib/firefox/firefox 5MiB |
+-----------------------------------------------------------------------------+

我尝试将“with tf.device("/gpu:0"):" 放在矩阵乘法之前,但它只会给我一个错误:

“InvalidArgumentError(有关回溯,请参见上文):无法将设备分配给节点“MatMul”:无法满足显式设备规范“/device:GPU:0”,因为在此过程中没有注册符合该规范的设备;可用设备: /job:localhost/replica:0/task:0/cpu:0 [[节点:MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](Reshape, softmax/Variable/read)]]"

我知道 cudnn 已正确安装,因为我在终端中运行它时收到此消息。

import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

我认为这与 Jupiter 笔记本有关,是否存在兼容性问题?当我运行 TF 会话时,我得到以下输出:

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Device mapping: no known devices.

"""

【问题讨论】:

【参考方案1】:

我解决了这个问题。显然我在我的环境之外安装了 jupyter 和常规的 tensorflow。然而,我在我的环境中安装了 tensorflow-gpu。因此,当我运行 jupyter 时,它调用的是环境之外的 tensorflow,而不是环境中安装的 tensorflow-gpu。

【讨论】:

你能详细说明一下这个的意思吗?这是否专门指 Jupyter New 下拉列表中的项目,如果配备了 conda 环境,您可以在其中选择? “显然我在我的环境之外安装了 jupyter 和常规 tensorflow。但我在我的环境中安装了 tensorflow-gpu。所以当我运行 jupyter 时,它调用的是环境之外的 tensorflow,而不是环境中安装的 tensorflow-gpu。” 您是说您创建了一个新的 jupyter 笔记本并选择了“环境 A”(例如),但您的代码实际上是如何在“环境 B”中运行的?这甚至可能吗? @GeoffreyAnderson:在我的例子中,我创建了一个新的环境(环境 A),它可以运行 tensorflow。但是,旧的 tensorflow 版本也安装在 conda(基本)环境中。不知何故,当使用 jupyter notebook (env A) 时,它试图在 base 中加载过时的 tensorflow(使用不兼容的 CUDA)并引发错误。我认为这就是OP所说的。我通过卸载 anaconda 并重新安装来修复它。

以上是关于Tensorflow 未在 jupyter notebook 的 GPU 上运行的主要内容,如果未能解决你的问题,请参考以下文章

动画未在 Jupyter 笔记本中运行

Graphviz.Source 未在 Jupyter Notebook 中呈现

Jupyter 上的 TensorFlow:无法恢复变量

“未找到导入 tensorflow 模块”仅在 jupyter 笔记本上,但不在 jupyter 实验室或终端上

在 Jupyter 中可视化 TensorFlow 图的简单方法?

解决不能再jupyter notebook中使用tensorflow