初始化模型时 GPU 内存不足

Posted

技术标签:

【中文标题】初始化模型时 GPU 内存不足【英文标题】:GPU out of memory when initializing model 【发布时间】:2020-11-12 21:57:35 【问题描述】:

我正在尝试使用 tensorflow 构建具有三重损失函数的连体神经网络。 这是它的样子

def build_network(input_shape, embeddingsize):
    network = Sequential()
    network.add(Conv2D(128, (7,7), activation='relu',
                     input_shape=input_shape,
                     kernel_initializer='he_uniform',
                     kernel_regularizer=l2(2e-4)))
    network.add(MaxPooling2D())
    network.add(Conv2D(128, (3,3), activation='relu', kernel_initializer='he_uniform',
                     kernel_regularizer=l2(2e-4)))
    network.add(MaxPooling2D())
    network.add(Conv2D(256, (3,3), activation='relu', kernel_initializer='he_uniform',
                     kernel_regularizer=l2(2e-4)))
    network.add(Flatten())
    network.add(Dense(4096, activation='relu',
                   kernel_regularizer=l2(1e-3),
                   kernel_initializer='he_uniform'))
    
    
    network.add(Dense(embeddingsize, activation=None,
                   kernel_regularizer=l2(1e-3),
                   kernel_initializer='he_uniform'))
    
    #Force the encoding to live on the d-dimentional hypershpere
    network.add(Lambda(lambda x: K.l2_normalize(x,axis=-1)))
    
    return network

当我尝试通过此代码初始化模型时

emb_dim = 64
embedding_model = build_network(X_train[1].shape, emb_dim)

embedding_model.summary()

显示此错误。

ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-22-9a90ee998c2d> in <module>
      1 emb_dim = 64
      2 
----> 3 embedding_model = build_network(X_train[1].shape, emb_dim)
      4 
      5 # embedding_model = Sequential([

<ipython-input-19-f51afd4ad3e5> in build_network(input_shape, embeddingsize)
     21     network.add(Dense(4096, activation='relu',
     22                    kernel_regularizer=l2(1e-3),
---> 23                    kernel_initializer='he_uniform'))
     24 
     25 

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\training\tracking\base.py in _method_wrapper(self, *args, **kwargs)
    455     self._self_setattr_tracking = False  # pylint: disable=protected-access
    456     try:
--> 457       result = method(self, *args, **kwargs)
    458     finally:
    459       self._self_setattr_tracking = previous_value  # pylint: disable=protected-access

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\keras\engine\sequential.py in add(self, layer)
    201       # If the model is being built continuously on top of an input layer:
    202       # refresh its output.
--> 203       output_tensor = layer(self.outputs[0])
    204       if len(nest.flatten(output_tensor)) != 1:
    205         raise TypeError('All layers in a Sequential model '

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in __call__(self, inputs, *args, **kwargs)
    746           # Build layer if applicable (if the `build` method has been
    747           # overridden).
--> 748           self._maybe_build(inputs)
    749           cast_inputs = self._maybe_cast_inputs(inputs)
    750 

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in _maybe_build(self, inputs)
   2114         # operations.
   2115         with tf_utils.maybe_init_scope(self):
-> 2116           self.build(input_shapes)
   2117       # We must set self.built since user defined build functions are not
   2118       # constrained to set self.built.

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\keras\layers\core.py in build(self, input_shape)
   1111         constraint=self.kernel_constraint,
   1112         dtype=self.dtype,
-> 1113         trainable=True)
   1114     if self.use_bias:
   1115       self.bias = self.add_weight(

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, partitioner, use_resource, synchronization, aggregation, **kwargs)
    444         synchronization=synchronization,
    445         aggregation=aggregation,
--> 446         caching_device=caching_device)
    447     backend.track_variable(variable)
    448 

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\training\tracking\base.py in _add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
    742         dtype=dtype,
    743         initializer=initializer,
--> 744         **kwargs_for_getter)
    745 
    746     # If we set an initializer and the variable processed it, tracking will not

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer_utils.py in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner)
    140       synchronization=synchronization,
    141       aggregation=aggregation,
--> 142       shape=variable_shape if variable_shape else None)
    143 
    144 

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\variables.py in __call__(cls, *args, **kwargs)
    256   def __call__(cls, *args, **kwargs):
    257     if cls is VariableV1:
--> 258       return cls._variable_v1_call(*args, **kwargs)
    259     elif cls is Variable:
    260       return cls._variable_v2_call(*args, **kwargs)

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\variables.py in _variable_v1_call(cls, initial_value, trainable, collections, validate_shape, caching_device, name, variable_def, dtype, expected_shape, import_scope, constraint, use_resource, synchronization, aggregation, shape)
    217         synchronization=synchronization,
    218         aggregation=aggregation,
--> 219         shape=shape)
    220 
    221   def _variable_v2_call(cls,

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\variables.py in <lambda>(**kwargs)
    195                         shape=None):
    196     """Call on Variable class. Useful to force the signature."""
--> 197     previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
    198     for _, getter in ops.get_default_graph()._variable_creator_stack:  # pylint: disable=protected-access
    199       previous_getter = _make_getter(getter, previous_getter)

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\variable_scope.py in default_variable_creator(next_creator, **kwargs)
   2594         synchronization=synchronization,
   2595         aggregation=aggregation,
-> 2596         shape=shape)
   2597   else:
   2598     return variables.RefVariable(

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\variables.py in __call__(cls, *args, **kwargs)
    260       return cls._variable_v2_call(*args, **kwargs)
    261     else:
--> 262       return super(VariableMetaclass, cls).__call__(*args, **kwargs)
    263 
    264 

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py in __init__(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, variable_def, import_scope, constraint, distribute_strategy, synchronization, aggregation, shape)
   1409           aggregation=aggregation,
   1410           shape=shape,
-> 1411           distribute_strategy=distribute_strategy)
   1412 
   1413   def _init_from_args(self,

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py in _init_from_args(self, initial_value, trainable, collections, caching_device, name, dtype, constraint, synchronization, aggregation, distribute_strategy, shape)
   1540           with ops.name_scope("Initializer"), device_context_manager(None):
   1541             initial_value = ops.convert_to_tensor(
-> 1542                 initial_value() if init_from_fn else initial_value,
   1543                 name="initial_value", dtype=dtype)
   1544           if shape is not None:

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer_utils.py in <lambda>()
    120           (type(init_ops.Initializer), type(init_ops_v2.Initializer))):
    121         initializer = initializer()
--> 122       init_val = lambda: initializer(shape, dtype=dtype)
    123       variable_dtype = dtype.base_dtype
    124   if use_resource is None:

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\init_ops_v2.py in __call__(self, shape, dtype)
    423     else:
    424       limit = math.sqrt(3.0 * scale)
--> 425       return self._random_generator.random_uniform(shape, -limit, limit, dtype)
    426 
    427   def get_config(self):

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\init_ops_v2.py in random_uniform(self, shape, minval, maxval, dtype)
    786       op = random_ops.random_uniform
    787     return op(
--> 788         shape=shape, minval=minval, maxval=maxval, dtype=dtype, seed=self.seed)
    789 
    790   def truncated_normal(self, shape, mean, stddev, dtype):

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\random_ops.py in random_uniform(shape, minval, maxval, dtype, seed, name)
    271     else:
    272       rnd = gen_random_ops.random_uniform(shape, dtype, seed=seed1, seed2=seed2)
--> 273       result = math_ops.add(rnd * (maxval - minval), minval, name=name)
    274     # TODO(b/132092188): C++ shape inference inside functional ops does not
    275     # cross FuncGraph boundaries since that information is only available in

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py in add(x, y, name)
    341         raise
    342     except _core._NotOkStatusException as e:
--> 343       _ops.raise_from_not_ok_status(e, name)
    344   # Add nodes to the TensorFlow graph.
    345   try:

~\.conda\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py in raise_from_not_ok_status(e, name)
   6604   message = e.message + (" name: " + name if name is not None else "")
   6605   # pylint: disable=protected-access
-> 6606   six.raise_from(core._status_to_exception(e.code, message), None)
   6607   # pylint: enable=protected-access
   6608 

~\.conda\envs\py36\lib\site-packages\six.py in raise_from(value, from_value)

ResourceExhaustedError: OOM when allocating tensor with shape[278784,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Add] name: dense/kernel/Initializer/random_uniform/

我正在使用带有 nvidia k80 GPU 的 microsoft azure 虚拟机。 有一个内核可用,内存为 12GB。 我检查了 nvidia-smi,似乎模型占用了所有内存

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 426.00       Driver Version: 426.00       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           TCC  | 00000001:00:00.0 Off |                    0 |
| N/A   54C    P0    55W / 149W |  10889MiB / 11448MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      6620      C   ...cbbivmadmin\.conda\envs\py36\python.exe 10766MiB |
+-----------------------------------------------------------------------------+

当我尝试将相同的模型加载到其他只有 CPU 的机器上时,它才起作用

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 144, 144, 128)     18944     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 72, 72, 128)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 70, 70, 128)       147584    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 35, 35, 128)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 33, 33, 256)       295168    
_________________________________________________________________
flatten (Flatten)            (None, 278784)            0         
_________________________________________________________________
dense (Dense)                (None, 4096)              1141903360
_________________________________________________________________
dense_1 (Dense)              (None, 64)                262208    
_________________________________________________________________
lambda (Lambda)              (None, 64)                0         
=================================================================
Total params: 1,142,627,264
Trainable params: 1,142,627,264
Non-trainable params: 0

此外,我不确定为什么它默认加载到 GPU 内存并占用所有内存。

【问题讨论】:

【参考方案1】:

您似乎在某处指定了错误的尺寸:

OOM when allocating tensor with shape[278784,4096] and type float
                                      ^^^^^^

确保在模型中定义层时使用正确的尺寸。


更新:

我检查了 nvidia-smi,似乎模型占用了所有内存

除非另有说明,否则 tensorflow 会预分配几乎所有 GPU 内存并在其中运行自己的内存分配策略,因此从 nvidia-smi 看来,GPU 的内存总是被充分利用。

【讨论】:

当我使用其他只有 CPU 的机器时它正在工作,我不确定为什么模型默认加载到 GPU 内存以及为什么它会占用所有内存 Tensorflow 默认使用 GPU(如果可用),但通常 CPU 的 RAM 比 GPU 大,因此模型可以在 CPU 上运行但在 GPU 上运行 OOM。您可以通过设置 os.environ['CUDA_VISIBLE_DEVICES']='-1' 来禁用 GPU 我想在 GPU 上训练模型。你可能知道为什么它占用了几乎所有的内存吗?例如 Resnet 占用 90MB,为什么我的模型占用超过 10GB? 查看编辑。您无法通过 nvidia-smi 查看模型的要求。 我添加了额外的 conv2d 层并减少了过滤器。看起来它正在工作。我认为问题在于模型的总参数数量。有 1,142,627,264,现在是 128,351,616。这会导致这个问题吗?

以上是关于初始化模型时 GPU 内存不足的主要内容,如果未能解决你的问题,请参考以下文章

Pytorch 模型在 CPU 和 GPU 上都内存不足,无法弄清楚我做错了啥

Caffe内存不足,用在哪里?

DSVM GPU 内存不足

GPU 在训练大型数据集期间内存不足

即使使用 AWS P8 实例,Yolov5 模型训练也因 CUDA 内存不足而失败

Tensorflow 耗尽 GPU 内存:分配器 (GPU_0_bfc) 尝试分配内存不足