尝试使用 tensorflow 运行教程 CNN 时出现 cuDNN_STATUS_ALLOC_FAILED

Posted

技术标签:

【中文标题】尝试使用 tensorflow 运行教程 CNN 时出现 cuDNN_STATUS_ALLOC_FAILED【英文标题】:cuDNN_STATUS_ALLOC_FAILED when trying to run a tutorial CNN with tensorflow 【发布时间】:2021-06-09 01:32:30 【问题描述】:

我正在尝试使用卷积神经网络 (CNN) 运行一个简单的 Python 脚本。每次我运行脚本时都会遇到以下错误消息

2021-03-10 19:47:03.832061: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
Traceback (most recent call last):
  File "CNN_trial.py", line 17, in <module>
    outputs = tf.nn.conv2d(images,filters,strides = 1,padding = "SAME")
  File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2158, in conv2d_v2
    return conv2d(input,  # pylint: disable=redefined-builtin
  File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2264, in conv2d
    return gen_nn_ops.conv2d(
  File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 942, in conv2d
    return conv2d_eager_fallback(
  File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1031, in conv2d_eager_fallback
    _result = _execute.execute(b"Conv2D", 1, inputs=_inputs_flat, attrs=_attrs,
  File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D] 

我的系统如下 视窗 10

AMD 锐龙 7 3700x

16GB 内存

英伟达 RTX 2060

Python 3.8.5

张量流 2.4.1

我的完整代码:

from sklearn.datasets import load_sample_image
import tensorflow as tf
import numpy as np 
import matplotlib.pyplot as plt

china = load_sample_image("china.jpg")/255
flower = load_sample_image("flower.jpg")/255
images = np.array([china,flower])
batch_size, height,width,channels = images.shape
filters = np.zeros(shape=(7,7,channels,2),dtype=np.float32)
filters[:,3,:,0] = 1
filters[3,:,:,1] = 1
outputs = tf.nn.conv2d(images,filters,strides = 1,padding = "SAME")
plt.imshow(outputs[0,:,:,1],cmap = "gray")
plt.show()

【问题讨论】:

您可能还有另一个代码实例仍在运行。所以原始实例仍在使用你必须终止的 GPU 来做你想做的事 那么我应该关闭所有其他可能正在运行代码的应用程序吗?我正在使用 VS 代码,但我没有打开另一个 IDE 来运行任何代码。 @BrainE 一个 VS 代码窗口可以打开多个终端 我关闭了所有其他 VS 代码窗口,除了我正在使用的那个窗口,仍然得到同样的错误 【参考方案1】:

看来我需要设置内存增长。通过将以下两行添加到脚本的开头。我至少让它运行起来了。

devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(devices[0],True)

【讨论】:

以上是关于尝试使用 tensorflow 运行教程 CNN 时出现 cuDNN_STATUS_ALLOC_FAILED的主要内容,如果未能解决你的问题,请参考以下文章

CNN入门mnist数据集运行环境搭建(安装Python,Pycharm,Anaconda,Tensorflow,CNN代码)

Tensorflow的CNN教程解析

使用 TensorFlow CNN 进行图像分类

Tensorflow的MNIST进阶教程CNN网络参数理解

训练CNN模型图像分类期间的tensorflow NaN损失

如何拆分自己的数据集以在 Tensorflow CNN 中进行训练和验证