非 OK 状态：GpuLaunchKernel(...) 状态：内部：没有可在设备上执行的内核映像

Posted 2023-04-15

技术标签:

【中文标题】非 OK 状态：GpuLaunchKernel(...) 状态：内部：没有可在设备上执行的内核映像【英文标题】：Non-OK-status: GpuLaunchKernel(...) status: Internal: no kernel image is available for execution on the device 【发布时间】：2020-11-25 04:58:40 【问题描述】：

我使用 CUDA Toolkit 10.1 CUDNN 7.6.0 (Windows 10) 在 tensorflow 2.1.0 Anaconda 上运行我的代码，但它返回了一个问题

F .\tensorflow/core/kernels/random_op_gpu.h:232] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: no kernel image is available for execution on the device

我的 GPU：GT940MX 计算能力 5.0

我已经运行 nvcc -V 并返回：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:26_Pacific_Standard_Time_2019
Cuda compilation tools, release 10.1, V10.1.105

这是完整的结果：

2020-08-05 10:05:48.368012: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-08-05 10:06:00.488544: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-08-05 10:06:48.153611: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 0.8605GHz coreCount: 4 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 37.33GiB/s
2020-08-05 10:06:48.164731: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-08-05 10:06:48.245826: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-08-05 10:06:48.296245: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-08-05 10:06:48.338860: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-08-05 10:06:48.439393: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-08-05 10:06:48.489830: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-08-05 10:06:48.941872: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-08-05 10:06:48.946651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-08-05 10:06:48.951881: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-05 10:06:48.979077: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x23d29b660d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-05 10:06:48.985680: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-05 10:06:48.990616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 0.8605GHz coreCount: 4 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 37.33GiB/s
2020-08-05 10:06:49.003356: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-08-05 10:06:49.009869: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-08-05 10:06:49.014858: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-08-05 10:06:49.020699: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-08-05 10:06:49.028876: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-08-05 10:06:49.033607: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-08-05 10:06:49.039192: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-08-05 10:06:49.045288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-08-05 10:06:49.218497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-05 10:06:49.223536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2020-08-05 10:06:49.226857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2020-08-05 10:06:49.230413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1460 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2020-08-05 10:06:49.244107: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x23d301b8fa0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-05 10:06:49.250377: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce 940MX, Compute Capability 5.0
2020-08-05 10:06:49.446601: F .\tensorflow/core/kernels/random_op_gpu.h:232] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: no kernel image is available for execution on the device

有哪些问题以及如何解决？

【问题讨论】：

你构建的tensorflow版本不支持你的GPU @talonmies 我已经满足 TensorFlow 要求（Cuda 计算能力 >3.5）我面临着完全相同的问题。 @talonmies 请提供信息，如果可以的话，哪个 tensorflow 版本将兼容，因为在 tensorflow 网站上，据说 2.3 版可以与 CUDA 10.1 和 cuDNN 7.6 兼容？ Tensorflow 的特定版本在理论上是否可以支持您的 GPU 并不重要。这是人们是否比构建您已安装的二进制版本选择编译以支持您的 GPU。这个问题需要向构建你所拥有的二进制文件的人提出。 @talonmies 我可以通过使用 bazel 从源代码构建它来运行 tensorflow 吗？ 【参考方案1】：

看起来这是 Python 3.8 和 Tensorflow 2.3 的问题。我用 python 3.7 尝试了 tensorflow 2.3.0，但它在 python 3.7 中返回错误，因为 python38.dll（我不记得确切的错误，我已经删除了 env），无论如何我在 anaconda env 上使用了 python 3.7 并安装带有 pip 的 tensorflow 2.1.0 并且可以正常工作。

我也在github上发布了这个问题，这个问题在githubhttps://github.com/tensorflow/tensorflow/issues/42052得到了回答

【讨论】：

感谢您的提示！以下组合似乎对我有用... GPU：GeForce GTX 750 Ti，python 3.7.8，tf 版本 2.1.1，cuda-v7.6.5.32【参考方案2】：

根据下面的屏幕截图，Tensorflow Versions 2.1, 2.2 and 2.3 适用于 cuDNN 版本 7.4 但 cuDNN version of your GPU is 7.6。

这很可能是错误的原因。

解决方案是将您的 GPU 的 cuDNN Version 降级。

cuDNN 的现有版本可以通过 Windows Control Panel 使用 Programs and Features widget 卸载。

可以安装新版本的cuDNN，如NVIDIA Installation Guide所示。

另外，请参考此Github Issue 以了解有关如何降级 cuDNN 版本的更多信息。

以上截图取自Tensorflow Documentation。

【讨论】：

但是没有 cuDNN 7.4 支持 CUDA 10.1 @Tensorflow 支持您可以尝试将 cuDNN 版本降级到 7.4。我尝试了 CUDA 10.1 的 cuDNN 7.4（因为 CUDA 10.1 没有 cuDNN 7.4），它返回 same 问题

F .\tensorflow/core/kernels/random_op_gpu.h:232] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch&lt;Distribution&gt;, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: no kernel image is available for execution on the device (D:\Tensorflow\anaconda) PS D:\Tensorflow\TensorFlow-2.x-YOLOv3&gt;

我试过CUDA 10.1的cuDNN 7.4（因为CUDA 10.1没有cuDNN 7.4）==>这句话有点混乱。你能改写一下吗。谢谢！对不起，我的意思是：我尝试了 CUDA 10.0 的 cuDNN 7.4（因为 CUDA 10.1 没有 cuDNN 7.4），它返回相同的问题 @Tensorflow 支持【参考方案3】：

我有同样的问题，我的 cuDNN 是 8.0.2。正如您所说，CUDA 10.1 没有 cuDNN 7.4。所以，我为 CUDA 10.1 尝试了 cuDNN 7.5 并且它有效！！！！希望我的经验可以帮助别人。 :)

【讨论】：

【参考方案4】：

似乎某些 cuDNN 仅受某些特定版本的 tensorflow 支持。

作为 Window 用户，我就是这样做的：

Check which version that which Tensorflow and CUDA version combinations are compatible（可以点击左侧其他操作系统）正如 Rock Jefferson 所说，您可以将 cuDNN 7.5 用于 CUDA 10.1。它对我有用。 Download here

试试看。希望对你有用。

【讨论】：

我在使用 CUDA 10.1 和 cuDNN 7.6.5（列出的与 10.1 兼容的最新版本）时遇到了这个问题。我尝试降级到 cuDNN 7.5.1，但遇到了同样的问题。 7.5.0 也一样。

以上是关于非 OK 状态：GpuLaunchKernel(...) 状态：内部：没有可在设备上执行的内核映像的主要内容，如果未能解决你的问题，请参考以下文章

[日常] HTTP协议状态码

zabbix入门之定义触发器

Spring RestTemplate 与任何非 200 OK 响应交换 POST HttpClientException

Breeze 错误消息“；”在 Chrome 中状态为 OK

函数以状态完成：'ok'，但控制台日志显示函数返回未定义、预期的 Promise 或值

预检响应没有 HTTP ok 状态。