tensorflow-gpu 耗时太长

Posted 2023-04-15

技术标签:

【中文标题】tensorflow-gpu 耗时太长【英文标题】：tensorflow-gpu taking too long 【发布时间】：2021-08-07 23:33:52 【问题描述】：

已解决

我最近购买了一台配备 Nvidia RTX 3080 的笔记本电脑，并安装了 tensorflow-gpu 所需的必要库。安装它们后，我正在运行以下代码进行完整性检查：

import tensorflow as tf
import time


print(f"TensorFlow version: tf.__version__")
# TensorFlow version: 2.3.0

start = time.time()
print(tf.reduce_sum(tf.random.normal([1000, 1000])))
end = time.time()

print(f"it took = end - start seconds")

"""
2021-05-18 22:43:03.963371: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-05-18 22:43:05.775204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.545GHz coreCount: 48 deviceMemorySize: 16.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-05-18 22:43:05.775328: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-05-18 22:43:05.780061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-05-18 22:43:05.782762: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-05-18 22:43:05.783655: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-05-18 22:43:05.786527: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-05-18 22:43:05.788290: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-05-18 22:43:05.798942: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-05-18 22:43:05.799065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-18 22:43:05.799697: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-18 22:43:05.805786: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ace28679f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-05-18 22:43:05.805863: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-05-18 22:43:05.806387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.545GHz coreCount: 48 deviceMemorySize: 16.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-05-18 22:43:05.806547: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-05-18 22:43:05.807051: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-05-18 22:43:05.807346: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-05-18 22:43:05.807641: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-05-18 22:43:05.807948: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-05-18 22:43:05.808240: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-05-18 22:43:05.808529: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-05-18 22:43:05.808841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-18 22:46:57.375562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-18 22:46:57.375695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-05-18 22:46:57.376038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-05-18 22:46:57.376271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14255 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-05-18 22:46:57.378538: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1aca510dc20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-05-18 22:46:57.378605: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3080 Laptop GPU, Compute Capability 8.6
tf.Tensor(-1331.8541, shape=(), dtype=float32)
it took = 233.85769605636597 seconds
"""

这条单线大约需要 4 分钟。这不行。某处有问题。有关已安装系统的更多信息：

sys_details = tf.sysconfig.get_build_info()

sys_details['cuda_version']
# '64_101'

sys_details['cuda_compute_capabilities']
'''
['compute_30',
 'compute_35',
 'compute_52',
 'compute_60',
 'compute_61',
 'compute_70',
 'compute_75']
'''

sys_details['cudnn_version']
# '64_7'

怎么了？

【问题讨论】：

JIT 编译。使用更新的 TF 和更新的 CUDA 版本，11.1 或更新。您的 GPU 是 compute_86，它甚至没有出现在您的列表中 --> 您使用的 TF 版本并不是真正设计用于您的 GPU。任何教程显示了这样做的正确方法？您应该使用最新版本的 TensorFlow (2.5)，因为众所周知，旧版本在最新的 RTX 卡中表现不佳。 【参考方案1】：

Nvidia RTX 3080 卡基于Ampere 架构，兼容的CUDA 版本以11.x 开头。

张量流从2.3 升级到2.4 或2.5 将解决上述问题。更多详情可以参考here。

【讨论】：

以上是关于tensorflow-gpu 耗时太长的主要内容，如果未能解决你的问题，请参考以下文章