在自定义回调中访问验证数据
Posted
技术标签:
【中文标题】在自定义回调中访问验证数据【英文标题】:Accessing validation data within a custom callback 【发布时间】:2018-05-20 10:47:53 【问题描述】:我正在安装一个 train_generator,并且我想通过一个自定义回调来计算我的 validation_generator 上的自定义指标。
如何在自定义回调中访问参数 validation_steps
和 validation_data
?
不在self.params
,在self.model
也找不到。这就是我想做的。欢迎任何不同的方法。
model.fit_generator(generator=train_generator,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_steps,
callbacks=[CustomMetrics()])
class CustomMetrics(keras.callbacks.Callback):
def on_epoch_end(self, batch, logs=):
for i in validation_steps:
# features, labels = next(validation_data)
# compute custom metric: f(features, labels)
return
keras:2.1.1
更新
我设法将我的验证数据传递给自定义回调的构造函数。但是,这会导致令人讨厌的“内核似乎已经死机。它将自动重新启动。”信息。我怀疑这是否是正确的方法。有什么建议吗?
class CustomMetrics(keras.callbacks.Callback):
def __init__(self, validation_generator, validation_steps):
self.validation_generator = validation_generator
self.validation_steps = validation_steps
def on_epoch_end(self, batch, logs=):
self.scores =
'recall_score': [],
'precision_score': [],
'f1_score': []
for batch_index in range(self.validation_steps):
features, y_true = next(self.validation_generator)
y_pred = np.asarray(self.model.predict(features))
y_pred = y_pred.round().astype(int)
self.scores['recall_score'].append(recall_score(y_true[:,0], y_pred[:,0]))
self.scores['precision_score'].append(precision_score(y_true[:,0], y_pred[:,0]))
self.scores['f1_score'].append(f1_score(y_true[:,0], y_pred[:,0]))
return
metrics = CustomMetrics(validation_generator, validation_steps)
model.fit_generator(generator=train_generator,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_steps,
shuffle=True,
callbacks=[metrics],
verbose=1)
【问题讨论】:
我认为没有好的选择。如果您查看 keras 中 _fit_loop 的代码,它并没有真正将 validation_steps 和 validation_data 传递给回调。 在(批量开始时)上使用 next(validation_generatro) 怎么样,这会比你的方式更好吗?我的意思是,在这种情况下,我不知道 next(val_generator) 是否会进行下一次迭代,或者它总是从头开始随机开始并且永远不会覆盖所有验证数据。 如果您查看 Keras TensorBoard 回调,似乎有一种方法可以从模型中获取验证数据,但我无法在代码中找到它发生的位置:github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/… 我在这里提供一个可能的答案:***.com/a/59697739/880783 这能回答你的问题吗? Create keras callback to save model predictions and targets for each batch during training 【参考方案1】:我正在锁定相同问题的解决方案,然后我在已接受的答案here 中找到了您的解决方案和另一个解决方案。如果第二个解决方案有效,我认为这比在“纪元结束”时再次遍历所有验证要好
想法是将target和pred占位符保存在变量中,并在“批处理结束”时通过自定义回调更新变量
【讨论】:
【参考方案2】:方法如下:
from sklearn.metrics import r2_score
class MetricsCallback(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
if epoch:
print(self.validation_data[0])
x_test = self.validation_data[0]
y_test = self.validation_data[1]
predictions = self.model.predict(x_test)
print('r2:', r2_score(prediction, y_test).round(2))
model.fit( ..., callbacks=[MetricsCallback()])
Reference
Keras 2.2.4
【讨论】:
据你在github上的参考,self.validation数据是None,这个问题还没有解决。 @VadymB。 - 那是因为Unfortunately, since moving from fit to flow_from_directory and fit_generator, this has erred because self.validation_data is None.
我正在使用fit
。【参考方案3】:
您可以直接遍历 self.validation_data 以在每个 epoch 结束时聚合所有验证数据。如果您想计算整个验证数据集的准确率、召回率和 F1:
# Validation metrics callback: validation precision, recall and F1
# Some of the code was adapted from https://medium.com/@thongonary/how-to-compute-f1-score-for-each-epoch-in-keras-a1acd17715a2
class Metrics(callbacks.Callback):
def on_train_begin(self, logs=):
self.val_f1s = []
self.val_recalls = []
self.val_precisions = []
def on_epoch_end(self, epoch, logs):
# 5.4.1 For each validation batch
for batch_index in range(0, len(self.validation_data)):
# 5.4.1.1 Get the batch target values
temp_targ = self.validation_data[batch_index][1]
# 5.4.1.2 Get the batch prediction values
temp_predict = (np.asarray(self.model.predict(
self.validation_data[batch_index][0]))).round()
# 5.4.1.3 Append them to the corresponding output objects
if(batch_index == 0):
val_targ = temp_targ
val_predict = temp_predict
else:
val_targ = np.vstack((val_targ, temp_targ))
val_predict = np.vstack((val_predict, temp_predict))
val_f1 = round(f1_score(val_targ, val_predict), 4)
val_recall = round(recall_score(val_targ, val_predict), 4)
val_precis = round(precision_score(val_targ, val_predict), 4)
self.val_f1s.append(val_f1)
self.val_recalls.append(val_recall)
self.val_precisions.append(val_precis)
# Add custom metrics to the logs, so that we can use them with
# EarlyStop and csvLogger callbacks
logs["val_f1"] = val_f1
logs["val_recall"] = val_recall
logs["val_precis"] = val_precis
print("— val_f1: — val_precis: — val_recall ".format(
val_f1, val_precis, val_recall))
return
valid_metrics = Metrics()
然后你可以在回调参数中添加valid_metrics:
your_model.fit_generator(..., callbacks = [valid_metrics])
请务必将其放在回调的开头,以防您希望其他回调使用这些措施。
【讨论】:
有没有办法使用验证数据的预测结果,而不是重新计算? 在def on_epoch_end(self, batch, logs)
中访问 self.validation 的先决条件是什么?我总是遇到AttributeError: 'Metrics' object has no attribute 'validation_data'
@vanessaxenia 您需要将 Metrics 类中的 validation_data 作为参数传递。
您的batch_index
实际上是数据的直接索引,因此它一次生成一个训练示例。您需要进行切片以获得完整批次。另外,更关键的是self.validation_data
只是一个包含 4 个元素的列表,这个答案根本不起作用。【参考方案4】:
Verdant89 犯了一些错误,并没有实现所有功能。下面的代码应该可以工作。
class Metrics(callbacks.Callback):
def on_train_begin(self, logs=):
self.val_f1s = []
self.val_recalls = []
self.val_precisions = []
def on_epoch_end(self, epoch, logs):
# 5.4.1 For each validation batch
for batch_index in range(0, len(self.validation_data[0])):
# 5.4.1.1 Get the batch target values
temp_target = self.validation_data[1][batch_index]
# 5.4.1.2 Get the batch prediction values
temp_predict = (np.asarray(self.model.predict(np.expand_dims(
self.validation_data[0][batch_index],axis=0)))).round()
# 5.4.1.3 Append them to the corresponding output objects
if batch_index == 0:
val_target = temp_target
val_predict = temp_predict
else:
val_target = np.vstack((val_target, temp_target))
val_predict = np.vstack((val_predict, temp_predict))
tp, tn, fp, fn = self.compute_tptnfpfn(val_target, val_predict)
val_f1 = round(self.compute_f1(tp, tn, fp, fn), 4)
val_recall = round(self.compute_recall(tp, tn, fp, fn), 4)
val_precis = round(self.compute_precision(tp, tn, fp, fn), 4)
self.val_f1s.append(val_f1)
self.val_recalls.append(val_recall)
self.val_precisions.append(val_precis)
# Add custom metrics to the logs, so that we can use them with
# EarlyStop and csvLogger callbacks
logs["val_f1"] = val_f1
logs["val_recall"] = val_recall
logs["val_precis"] = val_precis
print("— val_f1: — val_precis: — val_recall ".format(
val_f1, val_precis, val_recall))
return
def compute_tptnfpfn(self,val_target,val_predict):
# cast to boolean
val_target = val_target.astype('bool')
val_predict = val_predict.astype('bool')
tp = np.count_nonzero(val_target * val_predict)
tn = np.count_nonzero(~val_target * ~val_predict)
fp = np.count_nonzero(~val_target * val_predict)
fn = np.count_nonzero(val_target * ~val_predict)
return tp, tn, fp, fn
def compute_f1(self,tp, tn, fp, fn):
f1 = tp*1. / (tp + 0.5*(fp+fn) + sys.float_info.epsilon)
return f1
def compute_recall(self,tp, tn, fp, fn):
recall = tp*1. / (tp + fn + sys.float_info.epsilon)
return recall
def compute_precision(self,tp, tn, fp, fn):
precision = tp*1. / (tp + fp + sys.float_info.epsilon)
return precision
【讨论】:
以上是关于在自定义回调中访问验证数据的主要内容,如果未能解决你的问题,请参考以下文章