tensorflow-gpu 耗时太长
Posted
技术标签:
【中文标题】tensorflow-gpu 耗时太长【英文标题】:tensorflow-gpu taking too long 【发布时间】:2021-08-07 23:33:52 【问题描述】:已解决
我最近购买了一台配备 Nvidia RTX 3080 的笔记本电脑,并安装了 tensorflow-gpu 所需的必要库。安装它们后,我正在运行以下代码进行完整性检查:
import tensorflow as tf
import time
print(f"TensorFlow version: tf.__version__")
# TensorFlow version: 2.3.0
start = time.time()
print(tf.reduce_sum(tf.random.normal([1000, 1000])))
end = time.time()
print(f"it took = end - start seconds")
"""
2021-05-18 22:43:03.963371: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-05-18 22:43:05.775204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.545GHz coreCount: 48 deviceMemorySize: 16.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-05-18 22:43:05.775328: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-05-18 22:43:05.780061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-05-18 22:43:05.782762: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-05-18 22:43:05.783655: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-05-18 22:43:05.786527: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-05-18 22:43:05.788290: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-05-18 22:43:05.798942: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-05-18 22:43:05.799065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-18 22:43:05.799697: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-18 22:43:05.805786: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ace28679f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-05-18 22:43:05.805863: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-05-18 22:43:05.806387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.545GHz coreCount: 48 deviceMemorySize: 16.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-05-18 22:43:05.806547: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-05-18 22:43:05.807051: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-05-18 22:43:05.807346: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-05-18 22:43:05.807641: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-05-18 22:43:05.807948: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-05-18 22:43:05.808240: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-05-18 22:43:05.808529: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-05-18 22:43:05.808841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-18 22:46:57.375562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-18 22:46:57.375695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-05-18 22:46:57.376038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-05-18 22:46:57.376271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14255 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-05-18 22:46:57.378538: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1aca510dc20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-05-18 22:46:57.378605: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3080 Laptop GPU, Compute Capability 8.6
tf.Tensor(-1331.8541, shape=(), dtype=float32)
it took = 233.85769605636597 seconds
"""
这条单线大约需要 4 分钟。这不行。某处有问题。有关已安装系统的更多信息:
sys_details = tf.sysconfig.get_build_info()
sys_details['cuda_version']
# '64_101'
sys_details['cuda_compute_capabilities']
'''
['compute_30',
'compute_35',
'compute_52',
'compute_60',
'compute_61',
'compute_70',
'compute_75']
'''
sys_details['cudnn_version']
# '64_7'
怎么了?
【问题讨论】:
JIT 编译。使用更新的 TF 和更新的 CUDA 版本,11.1 或更新。您的 GPU 是compute_86
,它甚至没有出现在您的列表中 --> 您使用的 TF 版本并不是真正设计用于您的 GPU。
任何教程显示了这样做的正确方法?
您应该使用最新版本的 TensorFlow (2.5),因为众所周知,旧版本在最新的 RTX 卡中表现不佳。
【参考方案1】:
Nvidia RTX 3080
卡基于Ampere
架构,兼容的CUDA 版本以11.x
开头。
张量流从2.3
升级到2.4
或2.5
将解决上述问题。更多详情可以参考here。
【讨论】:
以上是关于tensorflow-gpu 耗时太长的主要内容,如果未能解决你的问题,请参考以下文章