Tensorflow 耗尽 GPU 内存:分配器 (GPU_0_bfc) 尝试分配内存不足
Posted
技术标签:
【中文标题】Tensorflow 耗尽 GPU 内存:分配器 (GPU_0_bfc) 尝试分配内存不足【英文标题】:Tensorflow running out of GPU memory: Allocator (GPU_0_bfc) ran out of memory trying to allocate 【发布时间】:2021-11-01 00:13:08 【问题描述】:我是 Tensorflow 的新手,我在使用 Dataset 时遇到了问题。我在 Windows 10 上工作,使用 CUDA 的 Tensorflow 版本是 2.6.0。 我有 2 个 numpy 数组,它们是 X_train 和 X_test(已经拆分)。火车是5Gb,测试是1.5Gb。 形状是:
X_train: (259018, 30, 30, 3),
Y_train: (259018, 1),
我使用以下代码创建数据集:
dataset_train = tf.data.Dataset.from_tensor_slices((X_train , Y_train)).batch(BATCH_SIZE)
而 BATCH_SIZE = 32。
但我无法创建数据集,我收到以下错误:
2021-09-02 15:26:35.429930: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-02 15:26:35.772235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3495 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2021-09-02 15:26:36.414627: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2700000000 exceeds 10% of free system memory.
2021-09-02 15:26:47.146977: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 607.1KiB (rounded to 621824)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2021-09-02 15:26:47.147299: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] BFCAllocator dump for GPU_0_bfc
2021-09-02 15:26:47.147383: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (256): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.147514: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.147636: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-09-02 15:26:47.147761: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.147905: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148040: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148157: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148276: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148402: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148518: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148645: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148786: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148918: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1048576): Total Chunks: 1, Chunks in use: 1. 1.91MiB allocated for chunks. 1.91MiB in use in bin. 1.91MiB client-requested in use in bin.
2021-09-02 15:26:47.149079: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.149212: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.149342: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.149477: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.164471: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.164619: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.164765: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.164884: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (268435456): Total Chunks: 2, Chunks in use: 2. 3.41GiB allocated for chunks. 3.41GiB in use in bin. 3.30GiB client-requested in use in bin.
2021-09-02 15:26:47.164982: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Bin for 607.2KiB was 512.0KiB, Chunk State:
2021-09-02 15:26:47.165040: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Next region of size 3665166336
2021-09-02 15:26:47.165106: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0e200000 of size 2700000000 next 1
2021-09-02 15:26:47.165159: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at baf0ebb00 of size 1280 next 2
2021-09-02 15:26:47.165208: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at baf0ec000 of size 2000128 next 3
2021-09-02 15:26:47.165250: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at baf2d4500 of size 963164928 next 18446744073709551615
2021-09-02 15:26:47.165297: I tensorflow/core/common_runtime/bfc_allocator.cc:1065] Summary of in-use Chunks by size:
2021-09-02 15:26:47.165341: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 1280 totalling 1.2KiB
2021-09-02 15:26:47.165382: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 2000128 totalling 1.91MiB
2021-09-02 15:26:47.165426: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 963164928 totalling 918.54MiB
2021-09-02 15:26:47.165470: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 2700000000 totalling 2.51GiB
2021-09-02 15:26:47.165514: I tensorflow/core/common_runtime/bfc_allocator.cc:1072] Sum Total of in-use chunks: 3.41GiB
2021-09-02 15:26:47.165558: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 3665166336 memory_limit_: 3665166336 available bytes: 0 curr_region_allocation_bytes_: 7330332672
2021-09-02 15:26:47.165633: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats:
Limit: 3665166336
InUse: 3665166336
MaxInUse: 3665166336
NumAllocs: 4
MaxAllocSize: 2700000000
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2021-09-02 15:26:47.165771: W tensorflow/core/common_runtime/bfc_allocator.cc:468] *************************************************************************************************xxx
Traceback (most recent call last):
File "C:/Users/headl/Documents/github projects/datascience/DL_model_deep_insight.py", line 100, in <module>
dataset_train, dataset_test = prepare_tf_dataset(path_to_x_train, config.y_train_combined,
File "C:/Users/headl/Documents/github projects/datascience/DL_model_deep_insight.py", line 28, in prepare_tf_dataset
dataset_test = tf.data.Dataset.from_tensor_slices((X_test , Y_test)).batch(BATCH_SIZE)
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 685, in from_tensor_slices
return TensorSliceDataset(tensors)
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 3844, in __init__
element = structure.normalize_element(element)
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\data\util\structure.py", line 129, in normalize_element
ops.convert_to_tensor(t, name="component_%d" % i, dtype=dtype))
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\profiler\trace.py", line 163, in wrapped
return func(*args, **kwargs)
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\ops.py", line 1566, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\tensor_conversion_registry.py", line 52, in _default_conversion_function
return constant_op.constant(value, dtype, name=name)
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\constant_op.py", line 271, in constant
return _constant_impl(value, dtype, shape, name, verify_shape=False,
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\constant_op.py", line 283, in _constant_impl
return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\constant_op.py", line 308, in _constant_eager_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\constant_op.py", line 106, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.
Process finished with exit code 1
似乎存在 GPU 内存不足的问题,事实上,当我在 Windows 任务管理器中执行此过程时,我可以在脚本终止之前看到 GPU 使用率达到峰值。 我试图只使用 X_train 的一部分。我可以创建一个高达 X_train[:240000] 的数据集。当我在那之后添加更多行时,会出现错误。 我认为 Tensorflow 数据集是一个应该处理内存问题的生成器,以及批处理?此外,减少批量大小也没有任何效果。 我也尝试过建议的“TF_GPU_ALLOCATOR=cuda_malloc_async”,但也没有用。
如何加载整个数据?
非常感谢您!
【问题讨论】:
【参考方案1】:这是按设计工作的。 from_tensor_slices 实际上只对少量数据有用。数据集专为需要从磁盘流式传输的大型数据集而设计。
最困难但理想的方法是将您的 numpy 数组数据写入 TFRecords,然后通过 TFRecordDataset 将它们作为数据集读取。这是指南。
https://www.tensorflow.org/tutorials/load_data/tfrecord
更简单但性能较差的方法是 Dataset.from_generator。这是一个最小的例子:
>>> ds = tf.data.Dataset.from_generator(lambda: np.arange(100), output_signature=tf.TensorSpec(shape=(), dtype=tf.int32))
>>> for d in ds:
... print(d)
...
tf.Tensor(0, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
...
【讨论】:
谢谢姚翔,对我来说最好的选择似乎是创建一个 TFRecordDataset! 那是理想的,而且不难……肯定不到一天。只是需要学习一些新的 api,但你会在你的 TF 职业生涯的其余部分使用这些学习,所以这是值得的。以上是关于Tensorflow 耗尽 GPU 内存:分配器 (GPU_0_bfc) 尝试分配内存不足的主要内容,如果未能解决你的问题,请参考以下文章
Tensorflow:将 allow_growth 设置为 true 仍然会分配我所有 GPU 的内存