为啥 fit_generator 的准确性与 Keras 中的 evaluate_generator 的准确性不同?
Posted
技术标签:
【中文标题】为啥 fit_generator 的准确性与 Keras 中的 evaluate_generator 的准确性不同?【英文标题】:Why is accuracy from fit_generator different to that from evaluate_generator in Keras?为什么 fit_generator 的准确性与 Keras 中的 evaluate_generator 的准确性不同? 【发布时间】:2019-08-29 08:27:11 【问题描述】:我的工作:
我正在使用 Kerasfit_generator()
训练一个预训练的 CNN。这会在每个 epoch 之后产生评估指标 (loss, acc, val_loss, val_acc
)。训练模型后,我使用 evaluate_generator()
生成评估指标 (loss, acc
)。
我的期望:
如果我将模型训练一个 epoch,我希望使用fit_generator()
和 evaluate_generator()
获得的指标是相同的。他们都应该根据整个数据集得出指标。
我的观察:
loss
和 acc
都不同于 fit_generator()
和 evaluate_generator()
:
我不明白的地方:
为什么fit_generator()
的准确率是
不同于evaluate_generator()
我的代码:
def generate_data(path, imagesize, nBatches):
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory\
(directory=path, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True) # whether to shuffle the data
return generator
[...]
def train_model(model, nBatches, nEpochs, trainGenerator, valGenerator, resultPath):
history = model.fit_generator(generator=trainGenerator,
steps_per_epoch=trainGenerator.samples//nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=None, # keras.callbacks.Callback instances to apply during training
validation_data=valGenerator, # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
validation_steps=
valGenerator.samples//nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=32, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=True, # whether to use process-based threading
shuffle=False, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
print("%s: Model trained." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
# Save model
modelPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelArchitecture.h5')
weightsPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelWeights.h5')
model.save(modelPath)
model.save_weights(weightsPath)
print("%s: Model saved." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
return history, model
[...]
def evaluate_model(model, generator):
score = model.evaluate_generator(generator=generator, # Generator yielding tuples
steps=
generator.samples//nBatches) # number of steps (batches of samples) to yield from generator before stopping
print("%s: Model evaluated:"
"\n\t\t\t\t\t\t Loss: %.3f"
"\n\t\t\t\t\t\t Accuracy: %.3f" %
(datetime.now().strftime('%Y-%m-%d_%H-%M-%S'),
score[0], score[1]))
[...]
def main():
# Create model
modelUntrained = create_model(imagesize, nBands, nClasses)
# Prepare training and validation data
trainGenerator = generate_data(imagePathTraining, imagesize, nBatches)
valGenerator = generate_data(imagePathValidation, imagesize, nBatches)
# Train and save model
history, modelTrained = train_model(modelUntrained, nBatches, nEpochs, trainGenerator, valGenerator, resultPath)
# Evaluate on validation data
print("%s: Model evaluation (valX, valY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
evaluate_model(modelTrained, valGenerator)
# Evaluate on training data
print("%s: Model evaluation (trainX, trainY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
evaluate_model(modelTrained, trainGenerator)
更新
我发现一些网站报告了这个问题:
The Batch Normalization layer of Keras is broken Strange behaviour of the loss function in keras model, with pretrained convolutional base model.evaluate() gives a different loss on training data from the one in training process Got different accuracy between history and evaluate ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data到目前为止,我尝试遵循他们建议的一些解决方案,但没有成功。 acc
和 loss
仍然不同于 fit_generator()
和 evaluate_generator()
,即使使用相同的生成器生成的完全相同的数据进行训练和验证也是如此。这是我尝试过的:
K.set_learning_phase(0) # testing
K.set_learning_phase(1) # training
从预训练模型中解冻所有批量标准化层
for i in range(len(model.layers)):
if str.startswith(model.layers[i].name, 'bn'):
model.layers[i].trainable=True
未将 dropout 或批量归一化添加为未经训练的层
# Create pre-trained base model
basemodel = ResNet50(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
# Create new untrained layers
x = basemodel.output
x = GlobalAveragePooling2D()(x) # global spatial average pooling layer
x = Dense(1024, activation='relu')(x) # fully-connected layer
y = Dense(nClasses, activation='softmax')(x) # logistic layer making sure that probabilities sum up to 1
# Create model combining pre-trained base model and new untrained layers
model = Model(inputs=basemodel.input,
outputs=y)
# Freeze weights on pre-trained layers
for layer in basemodel.layers:
layer.trainable = False
# Define learning optimizer
learningRate = 0.01
optimizerSGD = optimizers.SGD(lr=learningRate, # learning rate.
momentum=0.9, # parameter that accelerates SGD in the relevant direction and dampens oscillations
decay=learningRate/nEpochs, # learning rate decay over each update
nesterov=True) # whether to apply Nesterov momentum
# Compile model
model.compile(optimizer=optimizerSGD, # stochastic gradient descent optimizer
loss='categorical_crossentropy', # objective function
metrics=['accuracy'], # metrics to be evaluated by the model during training and testing
loss_weights=None, # scalar coefficients to weight the loss contributions of different model outputs
sample_weight_mode=None, # sample-wise weights
weighted_metrics=None, # metrics to be evaluated and weighted by sample_weight or class_weight during training and testing
target_tensors=None) # tensor model's target, which will be fed with the target data during training
使用不同的预训练 CNN 作为基础模型(VGG19、InceptionV3、InceptionResNetV2、Xception)
from keras.applications.vgg19 import VGG19
basemodel = VGG19(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
如果我缺少其他解决方案,请告诉我。
【问题讨论】:
尝试创建两个验证生成器实例,将一个传递给model.fit,另一个传递给evaluate_generator,看看是否产生相同的结果。由于在许多情况下,批量大小并不能准确地划分样本数,因此在确定步数时,整数除法可能会跳过一个批次,然后由评估生成器消耗,从而产生略有不同的度量。 【参考方案1】:将use_multiprocessing=False
设置为fit_generator
级别解决了问题,但代价是显着减慢了训练速度。一个更好但仍然不完美的解决方法是为验证生成器设置use_multiprocessing=False
,因为下面的代码是从keras的fit_generator
函数修改的。
...
try:
if do_validation:
if val_gen and workers > 0:
# Create an Enqueuer that can be reused
val_data = validation_data
if isinstance(val_data, Sequence):
val_enqueuer = OrderedEnqueuer(val_data,
**use_multiprocessing=False**)
validation_steps = len(val_data)
else:
val_enqueuer = GeneratorEnqueuer(val_data,
**use_multiprocessing=False**)
val_enqueuer.start(workers=workers,
max_queue_size=max_queue_size)
val_enqueuer_gen = val_enqueuer.get()
...
【讨论】:
【参考方案2】:我现在设法拥有相同的评估指标。我更改了以下内容:
我按照@Anakin 的建议在flow_from_directory()
中设置了seed
def generate_data(path, imagesize, nBatches):
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory(directory=path, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True, # whether to shuffle the data
seed=42) # random seed for shuffling and transformations
return generator
我根据警告在
fit_generator()
中设置了use_multiprocessing=False
:use_multiprocessing=True and multiple workers may duplicate your data
history = model.fit_generator(generator=trainGenerator,
steps_per_epoch=trainGenerator.samples//nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=callback, # keras.callbacks.Callback instances to apply during training
validation_data=valGenerator, # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
validation_steps=
valGenerator.samples//nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=1, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=False, # whether to use process-based threading
shuffle=False, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
我按照keras documentation 中关于如何在开发过程中使用 Keras 获得可重现的结果中的建议统一了我的 python 设置
import tensorflow as tf
import random as rn
from keras import backend as K
np.random.seed(42)
rn.seed(12345)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
我现在不是使用
datagen = ImageDataGenerator(rescale=1./255)
重新缩放输入图像,而是生成我的数据:
from keras.applications.resnet50 import preprocess_input
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
有了这个,我设法从fit_generator()
和evaluate_generator()
获得了相似的准确度和损失。此外,现在使用相同的数据进行训练和测试会产生相似的指标。 存在差异的原因在keras documentation 中提供。
【讨论】:
请参考此链接keras.io/getting-started/faq/…。训练损失是每批训练数据的损失的平均值,并且随着模型的训练,这个损失会随着每批的变化而变化。【参考方案3】:在这种情况下,一个 epoch 的训练可能无法提供足够的信息。此外,您的训练数据和测试数据可能不完全相同,因为您没有为 flow_from_directory
方法设置随机种子。看看here。
也许,您可以设置种子、移除增强(如果有)并保存训练后的模型权重以便稍后加载以进行检查。
【讨论】:
以上是关于为啥 fit_generator 的准确性与 Keras 中的 evaluate_generator 的准确性不同?的主要内容,如果未能解决你的问题,请参考以下文章
history=model.fit_generator() 为啥 keras 历史是空的?
Keras:网络不使用 fit_generator() 进行训练
keras/scikit-learn:使用 fit_generator() 进行交叉验证