Keras 和 Tensorflow GPU 在大图像数据上内存不足
Posted
技术标签:
【中文标题】Keras 和 Tensorflow GPU 在大图像数据上内存不足【英文标题】:Keras & Tensorflow GPU Out of Memory on Large Image Data 【发布时间】:2018-10-29 23:28:14 【问题描述】:我正在构建一个使用 Keras、Tensorflow GPU 后端和 CUDA 9.1 的图像分类系统,在 Ubuntu 18.04 上运行。
我正在使用一个非常大的图像数据集,其中包含 120 万张图像、15k 个类,大小为 335 GB。
我可以毫无问题地在 90,000 张图像上训练我的网络。然而,当我放大并使用包含 120 万张图像的整个数据集时,我得到了如下所示的错误,我认为这与内存不足有关。
我使用的是 GeForce GTX 1080 和 11GB 内存,我有 128GB 的 RAM、300GB 的交换文件和 16 核的 AMD Threadripper 1950X。
我遵循了解决类似问题的建议。我现在使用 smaller batch size of 10 或 even smaller,smaller dense inner layer 为 256,但在第一个训练时期开始之前,我仍然遇到下面显示的相同错误。
[更新]:我发现内存错误发生在 VGG16 predict_generator
调用期间,甚至在我的网络构建或训练之前。请参阅下面的代码。
首先,警告和错误:
2018-05-19 20:24:01.255788: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 5635855360 bytes on host: CUresult(304)
2018-05-19 20:24:01.255850: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 5635855360
然后例外:
2018-05-19 13:56:40.472404: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats:
Limit: 68719476736
InUse: 15548829696
MaxInUse: 15548829696
NumAllocs: 15542
MaxAllocSize: 16777216
2018-05-19 13:56:40.472563: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ****************************************************************************************************
Traceback (most recent call last):
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: block5_pool/MaxPool/_159 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_133_block5_pool/MaxPool", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "bottleneck.py", line 37, in <module>
bottleneck_features_train = model_vgg.predict_generator(train_generator_bottleneck)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/keras/engine/training.py", line 2510, in predict_generator
outs = self.predict_on_batch(x)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/keras/engine/training.py", line 1945, in predict_on_batch
outputs = self.predict_function(ins)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2478, in __call__
**self.session_kwargs)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/welshamy/tools/anaconda/3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: block5_pool/MaxPool/_159 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_133_block5_pool/MaxPool", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
这是我的代码:
import numpy as np
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential
from keras.preprocessing.image import ImageDataGenerator
from keras import applications
from keras.utils.np_utils import to_categorical
import matplotlib.pyplot as plt
# Dimensions of our images.
img_width, img_height = 224, 224
train_data_dir = './train_sample'
epochs = 100
batch_size = 10
# Data preprocessing
# Pixel values rescaling from [0, 255] to [0, 1] interval
datagen = ImageDataGenerator(rescale=1. / 255)
# Retrieve images and their classes for training set.
train_generator_bottleneck = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
num_classes = len(train_generator_bottleneck.class_indices)
model_vgg = applications.VGG16(include_top=False, weights='imagenet')
bottleneck_features_train = model_vgg.predict_generator(train_generator_bottleneck)
np.save('../models/bottleneck_features_train.npy', bottleneck_features_train)
train_data = np.load('../models/bottleneck_features_train.npy')
train_labels = to_categorical(train_generator_bottleneck.classes, num_classes=num_classes)
model_top = Sequential()
model_top.add(Flatten(input_shape=train_data.shape[1:]))
model_top.add(Dense(256, activation='relu'))
model_top.add(Dropout(0.5))
model_top.add(Dense(num_classes, activation='softmax'))
model_top.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Model saving callback
checkpointer = ModelCheckpoint(filepath='../models/bottleneck_features.h5', monitor='val_acc', verbose=1,
save_best_only=True)
# Early stopping
early_stopping = EarlyStopping(monitor='val_acc', verbose=1, patience=5)
history = model_top.fit(
train_data,
train_labels,
verbose=2,
epochs=epochs,
batch_size=batch_size,
callbacks=[checkpointer, early_stopping],
validation_split=0.3)
【问题讨论】:
【参考方案1】:我不认为这里的问题是 batch_size,因为你提到它已经太低了。此外,因为您说它适用于 90k 图像,所以问题可能是 train_data
无法适应内存中的 GPU(在每个适应时期开始时都需要)。为了缓解这个问题,您需要将model_top
与generator
相匹配,就像您从predict_generator
获得功能一样。一种方法是围绕train_data
包装一个生成器类,但我只是连接两个模型(注意我无法测试这个,但我认为它是正确的):
model_vgg = applications.VGG16(include_top=False, weights='imagenet')
model_top = Flatten()(model_vgg)
model_top = Dense(256, activation='relu')(model_top)
model_top = Dropout(0.3)(model_top)
model_top = Dense(num_classes, activation='softmax')(model_top)
model = Model(inputs=model_vgg.inputs, outputs=model_top)
model.compile(loss='sparse_categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Model saving callback
checkpointer = ModelCheckpoint(filepath='../models/bottleneck_features.h5', monitor='val_acc', verbose=1,
save_best_only=True)
# Early stopping
early_stopping = EarlyStopping(monitor='val_acc', verbose=1, patience=5)
history = model.fit_generator(
train_data,
train_labels,
verbose=2,
steps_per_epoch=steps_per_epoch,
batch_size=batch_size,
callbacks=[checkpointer, early_stopping],
...)
我将categorical_crossentropy
更改为sparse_categorical_crossentropy
,以便仅将索引作为标签发送,否则相同。您需要提供 steps_per_epoch 作为训练数据的长度/批量大小。或者只是输入任何数字进行测试。我还使用了 keras 函数式 api 来说明这一点。
这也将允许 VGG 顶部的权重发生变化,以帮助您更好地分类。如果由于某种原因这不是您想要的,您可以通过遍历所有 vgg 层并将 trainable
设置为 false 来冻结它。
lmk 如果它有效。
【讨论】:
谢谢@modesitt。我按照您的建议设置了steps_per_epoch
运行您的代码并得到了同样的错误。我还尝试了您的代码,同时设置 Tensorflow 配置以使用 config.gpu_options.allow_growth = True
和 config.gpu_options.per_process_gpu_memory_fraction = 0.7
限制 GPU 内存使用,并将它们传递给 set_session(tf.Session(config=config))
没有运气。作为妥协,我可能会在数据样本上进行训练。
hm @Wesam,这很奇怪。让我试试看以上是关于Keras 和 Tensorflow GPU 在大图像数据上内存不足的主要内容,如果未能解决你的问题,请参考以下文章
使用 GPU 而不是 CPU 与 Keras 和 Linux 的 Tensorflow 后端
使用 Keras 和 Tensorflow 降低 NVIDIA GPU 使用率
如何使用 Tensorflow-GPU 和 Keras 修复低挥发性 GPU-Util?
将 Keras 和 Tensorflow 与 AMD GPU 结合使用