Tensorflow 耗尽 GPU 内存:分配器 (GPU_0_bfc) 尝试分配内存不足

Posted

技术标签:

【中文标题】Tensorflow 耗尽 GPU 内存:分配器 (GPU_0_bfc) 尝试分配内存不足【英文标题】:Tensorflow running out of GPU memory: Allocator (GPU_0_bfc) ran out of memory trying to allocate 【发布时间】:2021-11-01 00:13:08 【问题描述】:

我是 Tensorflow 的新手,我在使用 Dataset 时遇到了问题。我在 Windows 10 上工作,使用 CUDA 的 Tensorflow 版本是 2.6.0。 我有 2 个 numpy 数组,它们是 X_train 和 X_test(已经拆分)。火车是5Gb,测试是1.5Gb。 形状是:

X_train: (259018, 30, 30, 3),

Y_train: (259018, 1),

我使用以下代码创建数据集:

dataset_train = tf.data.Dataset.from_tensor_slices((X_train , Y_train)).batch(BATCH_SIZE)

而 BATCH_SIZE = 32。

但我无法创建数据集,我收到以下错误:

2021-09-02 15:26:35.429930: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-02 15:26:35.772235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3495 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2021-09-02 15:26:36.414627: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2700000000 exceeds 10% of free system memory.
2021-09-02 15:26:47.146977: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 607.1KiB (rounded to 621824)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2021-09-02 15:26:47.147299: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] BFCAllocator dump for GPU_0_bfc
2021-09-02 15:26:47.147383: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (256):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.147514: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (512):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.147636: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1024):     Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-09-02 15:26:47.147761: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2048):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.147905: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4096):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148040: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8192):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148157: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16384):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148276: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (32768):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148402: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (65536):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148518: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (131072):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148645: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (262144):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148786: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (524288):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.148918: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1048576):  Total Chunks: 1, Chunks in use: 1. 1.91MiB allocated for chunks. 1.91MiB in use in bin. 1.91MiB client-requested in use in bin.
2021-09-02 15:26:47.149079: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2097152):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.149212: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4194304):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.149342: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8388608):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2021-09-02 15:26:47.149477: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16777216):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2021-09-02 15:26:47.164471: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (33554432):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.164619: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (67108864):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.164765: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (134217728):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-02 15:26:47.164884: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (268435456):    Total Chunks: 2, Chunks in use: 2. 3.41GiB allocated for chunks. 3.41GiB in use in bin. 3.30GiB client-requested in use in bin.
2021-09-02 15:26:47.164982: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Bin for 607.2KiB was 512.0KiB, Chunk State: 
2021-09-02 15:26:47.165040: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Next region of size 3665166336
2021-09-02 15:26:47.165106: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0e200000 of size 2700000000 next 1
2021-09-02 15:26:47.165159: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at baf0ebb00 of size 1280 next 2
2021-09-02 15:26:47.165208: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at baf0ec000 of size 2000128 next 3
2021-09-02 15:26:47.165250: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at baf2d4500 of size 963164928 next 18446744073709551615
2021-09-02 15:26:47.165297: I tensorflow/core/common_runtime/bfc_allocator.cc:1065]      Summary of in-use Chunks by size: 
2021-09-02 15:26:47.165341: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 1280 totalling 1.2KiB
2021-09-02 15:26:47.165382: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 2000128 totalling 1.91MiB
2021-09-02 15:26:47.165426: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 963164928 totalling 918.54MiB
2021-09-02 15:26:47.165470: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 2700000000 totalling 2.51GiB
2021-09-02 15:26:47.165514: I tensorflow/core/common_runtime/bfc_allocator.cc:1072] Sum Total of in-use chunks: 3.41GiB
2021-09-02 15:26:47.165558: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 3665166336 memory_limit_: 3665166336 available bytes: 0 curr_region_allocation_bytes_: 7330332672
2021-09-02 15:26:47.165633: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats: 
Limit:                      3665166336
InUse:                      3665166336
MaxInUse:                   3665166336
NumAllocs:                           4
MaxAllocSize:               2700000000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-09-02 15:26:47.165771: W tensorflow/core/common_runtime/bfc_allocator.cc:468] *************************************************************************************************xxx
Traceback (most recent call last):
  File "C:/Users/headl/Documents/github projects/datascience/DL_model_deep_insight.py", line 100, in <module>
    dataset_train, dataset_test = prepare_tf_dataset(path_to_x_train, config.y_train_combined,
  File "C:/Users/headl/Documents/github projects/datascience/DL_model_deep_insight.py", line 28, in prepare_tf_dataset
    dataset_test = tf.data.Dataset.from_tensor_slices((X_test , Y_test)).batch(BATCH_SIZE)
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 685, in from_tensor_slices
    return TensorSliceDataset(tensors)
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 3844, in __init__
    element = structure.normalize_element(element)
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\data\util\structure.py", line 129, in normalize_element
    ops.convert_to_tensor(t, name="component_%d" % i, dtype=dtype))
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\profiler\trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\ops.py", line 1566, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\tensor_conversion_registry.py", line 52, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\constant_op.py", line 271, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\constant_op.py", line 283, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\constant_op.py", line 308, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "C:\Users\headl\Documents\virtual_env\datascience\lib\site-packages\tensorflow\python\framework\constant_op.py", line 106, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

Process finished with exit code 1

似乎存在 GPU 内存不足的问题,事实上,当我在 Windows 任务管理器中执行此过程时,我可以在脚本终止之前看到 GPU 使用率达到峰值。 我试图只使用 X_train 的一部分。我可以创建一个高达 X_train[:240000] 的数据集。当我在那之后添加更多行时,会出现错误。 我认为 Tensorflow 数据集是一个应该处理内存问题的生成器,以及批处理?此外,减少批量大小也没有任何效果。 我也尝试过建议的“TF_GPU_ALLOCATOR=cuda_malloc_async”,但也没有用。

如何加载整个数据?

非常感谢您!

【问题讨论】:

【参考方案1】:

这是按设计工作的。 from_tensor_slices 实际上只对少量数据有用。数据集专为需要从磁盘流式传输的大型数据集而设计。

最困难但理想的方法是将您的 numpy 数组数据写入 TFRecords,然后通过 TFRecordDataset 将它们作为数据集读取。这是指南。

https://www.tensorflow.org/tutorials/load_data/tfrecord

更简单但性能较差的方法是 Dataset.from_generator。这是一个最小的例子:


>>> ds = tf.data.Dataset.from_generator(lambda: np.arange(100), output_signature=tf.TensorSpec(shape=(), dtype=tf.int32))
>>> for d in ds:
...   print(d)
... 
tf.Tensor(0, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
...

【讨论】:

谢谢姚翔,对我来说最好的选择似乎是创建一个 TFRecordDataset! 那是理想的,而且不难……肯定不到一天。只是需要学习一些新的 api,但你会在你的 TF 职业生涯的其余部分使用这些学习,所以这是值得的。

以上是关于Tensorflow 耗尽 GPU 内存:分配器 (GPU_0_bfc) 尝试分配内存不足的主要内容,如果未能解决你的问题,请参考以下文章

Tensorflow:将 allow_growth 设置为 true 仍然会分配我所有 GPU 的内存

TensorFlow GPU内存

tensorflow-gpu 使用的常见错误

如何解决 TensorFlow 中的“gpu 内存不足”问题

模型执行后清除 TensorFlow GPU 内存

TensorFlow 内存不足