tensorflow-gpu 不适用于 Blas GEMM 启动失败

Posted

技术标签:

【中文标题】tensorflow-gpu 不适用于 Blas GEMM 启动失败【英文标题】:tensorflow-gpu is not working with Blas GEMM launch failed 【发布时间】:2018-01-12 21:13:52 【问题描述】:

我安装了 tensorflow-gpu 以在我的 GPU 上运行我的 tensorflow 代码。但我不能让它运行。它不断给出上述错误。以下是我的示例代码,后跟错误堆栈跟踪:

import tensorflow as tf
import numpy as np

def check(W,X):
    return tf.matmul(W,X)


def main():
    W = tf.Variable(tf.truncated_normal([2,3], stddev=0.01))
    X = tf.placeholder(tf.float32, [3,2])
    check_handle = check(W,X)
    with tf.Session() as sess:
        tf.initialize_all_variables().run()
        num = sess.run(check_handle, feed_dict = 
            X:np.reshape(np.arange(6), (3,2)))
        print(num)
if __name__ == '__main__':
    main()

我的 GPU 是相当不错的 GeForce GTX 1080 Ti,具有 11 GB 的 vram,并且没有其他重要的东西在上面运行(只是 chrome),正如您在 nvidia-smi 中看到的那样:

Fri Aug  4 16:34:49 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.22                 Driver Version: 381.22                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 0000:07:00.0      On |                  N/A |
| 30%   55C    P0    79W / 250W |    711MiB / 11169MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      7650    G   /usr/lib/xorg/Xorg                             380MiB |
|    0      8233    G   compiz                                         192MiB |
|    0     24226    G   ...el-token=963C169BB38ADFD67B444D57A299CE0A   136MiB |
+-----------------------------------------------------------------------------+

以下是错误堆栈跟踪:

2017-08-04 15:44:21.585091: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585110: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585114: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585118: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585122: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.853700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:07:00.0
Total memory: 10.91GiB
Free memory: 9.89GiB
2017-08-04 15:44:21.853724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-04 15:44:21.853728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-08-04 15:44:21.853734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:07:00.0)
2017-08-04 15:44:24.948616: E tensorflow/stream_executor/cuda/cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-08-04 15:44:24.948640: W tensorflow/stream_executor/stream.cc:1601] attempting to perform BLAS operation using StreamExecutor without BLAS support
2017-08-04 15:44:24.948805: W tensorflow/core/framework/op_kernel.cc:1158] Internal: Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
     [[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
Traceback (most recent call last):
  File "test.py", line 51, in <module>
    _, loss_out, res_out = sess.run([train_op, loss, res], feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
     [[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
     [[Node: layer2/MatMul/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_158_layer2/MatMul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op u'layer1/MatMul', defined at:
  File "test.py", line 18, in <module>
    pre_activation = tf.matmul(input_ph, weights)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1816, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1217, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
     [[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
     [[Node: layer2/MatMul/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_158_layer2/MatMul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

此外,我之前安装的 tensorflow cpu 运行良好。任何帮助表示赞赏。谢谢!

注意-我已经安装了 cudnn-5.1 的 cuda-8.0,并将它们的路径添加到我的 bashrc 配置文件中。

【问题讨论】:

升级你的tensorflow版本;重新启动计算机也会有所帮助。 我都做了。没有帮助 从您的代码看来,您使用的不是最新版本的 tensorflow >= 1.2。 哦,你是说因为 initialize_all_varaibles 吗?它也适用于当前版本。只是发出警告。我安装了最新版本 您是否尝试过其他类似问题中的所有建议:***.com/questions/37337728/… 和 ***.com/questions/43990046/…? 【参考方案1】:

我有一个非常相似的问题。对我来说,它恰逢 nvidia 驱动程序更新。所以我认为这是驱动程序的问题。但是换驱动没有效果。 最终对我有用的是清理 nvidia 缓存:

sudo rm -rf ~/.nv/

在 NVIDIA 开发者论坛中找到了这个建议: https://devtalk.nvidia.com/default/topic/1007071/cuda-setup-and-installation/cuda-error-when-running-matrixmulcublas-sample-ubuntu-16-04/post/5169223/

我怀疑在驱动程序更新期间,仍然有一些旧版本的编译文件不兼容,甚至在此过程中已损坏。抛开假设不谈,这为我解决了问题。

【讨论】:

您的回答为我节省了很多时间。感谢分享... 感谢您的回答。对于 windows 用户,我只是将模型加载到我的 GPU model = tf.keras.applications.ResNet50() model.predict(np.zeros((batch_size, 224, 224, 3))) 以删除旧缓存【参考方案2】:

所以对我来说,这个错误的原因是我的 cuda 和所有子目录和文件都需要 root 权限。所以 tensorflow 也需要 root 权限才能使用 cuda。所以卸载 tensorflow 并以 root 用户身份再次安装它为我解决了这个问题。

【讨论】:

没有root权限怎么办? Errrrmmmm,为什么不直接设置 CUDA 以便任何用户都可以读取 inlcude&lib 文件并执行可执行文件?我希望安装后默认情况下会出现这种情况! (刚刚检查,确实我把它全部作为 755 或 644)【参考方案3】:

为我的 NVIDIA 显卡(例如,在我的情况下为 NVIDIA RTX 2070)安装正确的 NVIDIA 驱动程序和 CUDA 版本对我有用。

【讨论】:

以上是关于tensorflow-gpu 不适用于 Blas GEMM 启动失败的主要内容,如果未能解决你的问题,请参考以下文章

支持 Nvidia CUDA Toolkit 9.2

尽管安装了 tensorflow-gpu,但 GPU 不用于计算

OpenBLAS 0.2.19 发布,高性能多核 BLAS 库

如何将 lapack 和 BLAS 库链接到 C++ 代码

BLAS Level 2 带矩阵向量乘积多个向量

各算子库对CNN的支持