已加载运行时 CuDNN 库:8.0.5,但源代码编译为:8.1.0。使用 google colab 时

Posted

技术标签:

【中文标题】已加载运行时 CuDNN 库:8.0.5,但源代码编译为:8.1.0。使用 google colab 时【英文标题】:Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. when using google colab 【发布时间】:2022-01-02 14:22:38 【问题描述】:

我尝试关注 https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#training-the-model 在谷歌 colab 中

一切顺利,构建了pycocotools,使用object_detection/packages/tf2/setup.py进行设置,使用object_detection/builders/model_builder_tf2_test.py进行测试,创建tfrecord,一切运行顺利,没有任何问题

但是当训练开始时它总是失败

2021-11-24 04:51:47.954507: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-11-24 04:51:47.958479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

完整的错误是这样的

2021-11-24 04:51:47.954507: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-11-24 04:51:47.958479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Traceback (most recent call last):
  File "model_main_tf2.py", line 115, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "model_main_tf2.py", line 112, in main
    record_summaries=FLAGS.record_summaries)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 603, in train_loop
    train_input, unpad_groundtruth_tensors)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 394, in load_fine_tune_checkpoint
    _ensure_model_is_built(model, input_dataset, unpad_groundtruth_tensors)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 176, in _ensure_model_is_built
    labels,
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1286, in run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 2849, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/mirrored_strategy.py", line 671, in _call_for_each_replica
    self._container_strategy(), fn, args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 86, in call_for_each_replica
    return wrapped(args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3040, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1964, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node model/conv1_conv/Conv2D (defined at /local/lib/python3.7/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1346) ]]
     [[Loss/RPNLoss/BalancedPositiveNegativeSampler_1/Cast_8/_588]]
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node model/conv1_conv/Conv2D (defined at /local/lib/python3.7/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1346) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__dummy_computation_fn_44910]

我一直在尝试使用较低版本的 TensorFlow,例如 2.4.0,但问题仍然存在

【问题讨论】:

【参考方案1】:

我正在处理同样的问题,您需要在此处检查版本: https://www.tensorflow.org/install/source#gpu Tensorflow 对象检测使用 Tensorflow 2.6.0,因此您需要使用 8.1 的 cuDNN,但 Colab 运行时使用 8.0.5。 我解决了这个问题: https://developer.nvidia.com/cudnn 注册并下载

cudnn-11.2-linux-x64-v8.1.0.77.tgz

后来我将它上传到我的驱动器并在安装驱动器的情况下运行 colab。在使用 object_detection 的 colab 笔记本中,我放置在第一个单元格中:

!tar -zvxf /content/drive/MyDrive/task/cudnn-11.2-linux-x64-v8.1.0.77.tgz

以后

%%bash
cd cuda/include
sudo cp *.h /usr/local/cuda/include/

这解决了我的问题。

【讨论】:

以上是关于已加载运行时 CuDNN 库:8.0.5,但源代码编译为:8.1.0。使用 google colab 时的主要内容,如果未能解决你的问题,请参考以下文章

无法加载库 cudnn_cnn_infer64_8.dll。错误代码 126

TensorFlow中的cudnn编译配置

Tensorflow 2.2 GPU - 安装哪个 cuDNN 库?

已安装 Tensorflow-gpu、CUDA 和 cudnn,但发现 GPU 设备但未使用 [重复]

加载共享库时出错:libncurses.so.5:

如何验证 CuDNN 安装?