tensorflow内存消耗不断增加

Posted

技术标签:

【中文标题】tensorflow内存消耗不断增加【英文标题】:tensorflow memory consumption keeps increasing 【发布时间】:2021-03-10 12:20:17 【问题描述】:

我目前正在优化tensorflow.keras 中的 CNN 超参数,我正在迭代地创建模型、训练它们、记录结果并抓取它们。这工作了几个小时,让我可以训练超过 30 个模型而不会失败。但是,如果我运行的时间足够长,每次迭代都会消耗越来越多的内存,从而导致崩溃。有没有办法缓解这种情况

示例 sn-p:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv1D, MaxPooling1D
import datetime
import time

verbose, epochs, batch_size = 1, 15, 32

CONV_QUANTS = [2,4,6]
DENSE_QUANTS = [0,1,2]
DENSE_SIZES = [16,32,64]
KERNAL_SIZES = [3,9,15]
FILT_QUANTS = [16,32,64]
POOL_SIZES = [2,4,6]

testName = 'test_'.format(round(time.time()))

for convQuant in CONV_QUANTS:
    for denseQuant in DENSE_QUANTS:
        for denseSize in DENSE_SIZES:
            for kernalSize in KERNAL_SIZES:
                for filtQuant in FILT_QUANTS:
                    for poolSize in POOL_SIZES:
                        
                        #defining name
                        name = 'conv_dense_dSize_kSize_filtQuant_pSize_dt'.format(convQuant,
                                                                                                denseQuant,
                                                                                                denseSize,
                                                                                                kernalSize,
                                                                                                filtQuant,
                                                                                                poolSize,
                                                                                                datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
                        print(name)

                        #defining log
                        logdir = os.path.join("logs",testName,name)
                        tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

                        #initializing model
                        model = Sequential()
                        
                        #input convolutional layer
                        model.add(Conv1D(filters=filtQuant, kernel_size=kernalSize, activation='relu', input_shape = trainX[0].shape))
                        model.add(Dropout(0.1))
                        model.add(MaxPooling1D(pool_size=poolSize))
                        
                        #additional convolutional layers
                        for _ in range(convQuant-1):
                            model.add(Conv1D(filters=filtQuant, kernel_size=kernalSize, activation='relu'))
                            model.add(Dropout(0.1))
                            model.add(MaxPooling1D(pool_size=poolSize))
                        
                        #dense layers
                        model.add(Flatten())
                        
                        for _ in range(denseQuant):
                            model.add(Dense(denseSize, activation='relu'))
                            model.add(Dropout(0.5))
                            
                        #output
                        model.add(Dense(2, activation='softmax'))
                        
                        #training
                        model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
                        model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose, validation_data=(testX, testy), callbacks=[tensorboard_callback])
                        
                        #calculating accuracy
                        _, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
                        accuracy = accuracy * 100.0
                        print('accuracy: '.format(accuracy))

【问题讨论】:

【参考方案1】:

如果您在循环中创建多个模型,则此全局状态会随着时间的推移消耗越来越多的内存,您可能需要清除它。调用 clear_session() 释放全局状态:这有助于避免旧模型和层造成混乱,尤其是在内存有限的情况下。

for _ in range(100):
  # Without `clear_session()`, each iteration of this loop will
  # slightly increase the size of the global state managed by Keras
  model = tf.keras.Sequential([tf.keras.layers.Dense(10) for _ in range(10)])

for _ in range(100):
  # With `clear_session()` called at the beginning,
  # Keras starts with a blank state at each iteration
  # and memory consumption is constant over time.
  tf.keras.backend.clear_session()
  model = tf.keras.Sequential([tf.keras.layers.Dense(10) for _ in range(10)])

关于这个库的更多细节可以找到here

【讨论】:

以上是关于tensorflow内存消耗不断增加的主要内容,如果未能解决你的问题,请参考以下文章

在 TensorFlow 中的 GPU 之间平均分配 RNN 内存消耗

如何找出哪个课程占用的内存最多?

java - 如何在java中的堆上单独获取所有对象消耗的运行时内存

在多线程环境中使用 PyCurl 时程序消耗的内存不断增长

使用 Requests HTTP 库了解 python 中的内存消耗增加

当我增加批量大小时,为啥 tensorflow GPU 内存使用量会减少?