Tensorflow 未在 jupyter notebook 的 GPU 上运行

Posted 2023-03-13

技术标签:

【中文标题】Tensorflow 未在 jupyter notebook 的 GPU 上运行【英文标题】：Tensorflow not running on GPU in jupyter notebook 【发布时间】：2017-11-05 17:51:39 【问题描述】：

在 Ubuntu 上成功为 GTX 1080 ti 安装了 Cuda 和 cudnn，在 jupyter notebook 中运行一个简单的 TF 程序，在运行 tensorflow-gpu==1.0 vs tensorflow==1.0 的 conda 环境中速度不会增加。

当我运行 nvidia-smi 时：

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 0000:01:00.0 On | N/A |
| 24% 45C P0 62W / 250W | 537MiB / 11171MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1101 G /usr/lib/xorg/Xorg 310MiB |
| 0 1877 G compiz 219MiB |
| 0 3184 G /usr/lib/firefox/firefox 5MiB |
+-----------------------------------------------------------------------------+

我尝试将“with tf.device("/gpu:0"):" 放在矩阵乘法之前，但它只会给我一个错误：

“InvalidArgumentError（有关回溯，请参见上文）：无法将设备分配给节点“MatMul”：无法满足显式设备规范“/device:GPU:0”，因为在此过程中没有注册符合该规范的设备；可用设备: /job:localhost/replica:0/task:0/cpu:0 [[节点：MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](Reshape, softmax/Variable/read)]]"

我知道 cudnn 已正确安装，因为我在终端中运行它时收到此消息。

import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

我认为这与 Jupiter 笔记本有关，是否存在兼容性问题？当我运行 TF 会话时，我得到以下输出：

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Device mapping: no known devices.

"""

【问题讨论】：

【参考方案1】：

我解决了这个问题。显然我在我的环境之外安装了 jupyter 和常规的 tensorflow。然而，我在我的环境中安装了 tensorflow-gpu。因此，当我运行 jupyter 时，它调用的是环境之外的 tensorflow，而不是环境中安装的 tensorflow-gpu。

【讨论】：

你能详细说明一下这个的意思吗？这是否专门指 Jupyter New 下拉列表中的项目，如果配备了 conda 环境，您可以在其中选择？ “显然我在我的环境之外安装了 jupyter 和常规 tensorflow。但我在我的环境中安装了 tensorflow-gpu。所以当我运行 jupyter 时，它调用的是环境之外的 tensorflow，而不是环境中安装的 tensorflow-gpu。” 您是说您创建了一个新的 jupyter 笔记本并选择了“环境 A”（例如），但您的代码实际上是如何在“环境 B”中运行的？这甚至可能吗？ @GeoffreyAnderson：在我的例子中，我创建了一个新的环境（环境 A），它可以运行 tensorflow。但是，旧的 tensorflow 版本也安装在 conda（基本）环境中。不知何故，当使用 jupyter notebook (env A) 时，它试图在 base 中加载过时的 tensorflow（使用不兼容的 CUDA）并引发错误。我认为这就是OP所说的。我通过卸载 anaconda 并重新安装来修复它。

以上是关于Tensorflow 未在 jupyter notebook 的 GPU 上运行的主要内容，如果未能解决你的问题，请参考以下文章

动画未在 Jupyter 笔记本中运行

Graphviz.Source 未在 Jupyter Notebook 中呈现

Jupyter 上的 TensorFlow：无法恢复变量

“未找到导入 tensorflow 模块”仅在 jupyter 笔记本上，但不在 jupyter 实验室或终端上

在 Jupyter 中可视化 TensorFlow 图的简单方法？

解决不能再jupyter notebook中使用tensorflow