Keras scikit-learn 包装器在使用 one-hot 编码标签的交叉验证中的评分指标
Posted
技术标签:
【中文标题】Keras scikit-learn 包装器在使用 one-hot 编码标签的交叉验证中的评分指标【英文标题】:Scoring metrics from Keras scikit-learn wrapper in cross validation with one-hot encoded labels 【发布时间】:2020-08-12 19:34:08 【问题描述】:我正在实现一个神经网络,我想通过交叉验证来评估它的性能。这是我当前的代码:
def recall_m(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision_m(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
def f1_m(y_true, y_pred):
precision = precision_m(y_true, y_pred)
recall = recall_m(y_true, y_pred)
return 2*((precision*recall)/(precision+recall+K.epsilon()))
def build_model():
hiddenLayers = 1
neurons = 100
#hidden_neurons = int(train_x.shape[0]/(3*(neurons+1)))
hidden_neurons = 500
opt = optimizers.Adam(learning_rate=0.00005, amsgrad=False)
model = Sequential()
model.add(Dense(units=neurons, activation="relu", input_shape=(15,)))
model.add(Dense(units=2*hidden_neurons, activation="relu", input_shape=(18632,)))
model.add(Dense(units=4, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['acc',f1_m,precision_m, recall_m])
return model
x = df[['start-sin', 'start-cos', 'start-sin-lag', 'start-cos-lag', 'prev-close-sin', 'prev-close-cos', 'prev-length', 'state-lag', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']]
y = df[['wait-categ-none', 'wait-categ-short', 'wait-categ-medium', 'wait-categ-long']]
print(y)
#enforce, this is gone wrong somewhere
y = y.replace(False, 0)
y = y.replace(True, 1)
ep = 1
#fit = model.fit(train_x, train_y, epochs=ep, verbose=1)
#pred = model.predict(test_x)
#loss, accuracy, f1_score, precision, recall = model.evaluate(test_x, test_y, verbose=0)
classifier = KerasClassifier(build_fn=build_model, batch_size=10, epochs=ep)
accuracies = cross_val_score(estimator=classifier, X=x, y=y, cv=10, scoring="f1_macro", verbose=5)
我正在使用 cross_val_score 并尝试在函数本身中使用与准确性不同的指标,但我得到了错误
ValueError:分类指标无法处理多标签指标和二元目标的混合
并在此处阅读confusion matrix error "Classification metrics can't handle a mix of multilabel-indicator and multiclass targets",我需要在评分之前对输出进行一次热编码,但我找不到使用此功能的任何方法。
有没有比自己编写整个过程更好的方法来实现多个评分?如您所见,我已经实施了评分,并且它们在训练期间按预期工作,但由于 cross_val_score 我似乎无法提取信息
编辑:
我只运行了一次迭代,代码如下:
train, test = train_test_split(df, test_size=0.1, shuffle=True)
train_x = train[['start-sin', 'start-cos', 'start-sin-lag', 'start-cos-lag', 'prev-close-sin', 'prev-close-cos', 'prev-length', 'state-lag', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']]
train_y = train[['wait-categ-none', 'wait-categ-short', 'wait-categ-medium', 'wait-categ-long']]
test_x = test[['start-sin', 'start-cos', 'start-sin-lag', 'start-cos-lag', 'prev-close-sin', 'prev-close-cos', 'prev-length', 'state-lag', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']]
test_y = test[['wait-categ-none', 'wait-categ-short', 'wait-categ-medium', 'wait-categ-long']]
test_y = test_y.replace(False, 0).replace(True,1)
train_y = train_y.replace(False, 0).replace(True,1)
ep = 500
model = build_model()
print("Train y")
print(train_y)
print("Test y")
print(test_y)
model.fit(train_x, train_y, epochs=1, verbose=1)
pred = model.predict(test_x)
print(pred)
loss, accuracy, f1_score, precision, recall = model.evaluate(test_x, test_y, verbose=0)
这会产生以下输出:
训练你
wait-categ-none wait-categ-short wait-categ-medium wait-categ-long
4629 1 0 0 0
7643 0 1 0 0
4425 0 1 0 0
10548 1 0 0 0
14180 1 0 0 0
... ... ... ... ...
13661 1 0 0 0
10546 1 0 0 0
1966 1 0 0 0
5506 0 1 0 0
10793 1 0 0 0
[15632 rows x 4 columns]
测试一下
wait-categ-none wait-categ-short wait-categ-medium wait-categ-long
10394 0 1 0 0
3804 0 1 0 0
15136 0 1 0 0
7050 1 0 0 0
30 0 1 0 0
... ... ... ... ...
12040 0 1 0 0
4184 0 1 0 0
12345 1 0 0 0
12629 0 1 0 0
664 1 0 0 0
[1737 rows x 4 columns]
预测
[[2.63620764e-01 5.09552181e-01 1.72765702e-01 5.40613122e-02]
[5.40941073e-07 9.99827385e-01 1.72021420e-04 5.32279255e-11]
[5.91083081e-05 9.97556090e-01 2.38463446e-03 1.01058276e-07]
...
[2.69533932e-01 3.99731129e-01 2.22193986e-01 1.08540975e-01]
[5.87045122e-03 9.67754781e-01 2.62637101e-02 1.11028130e-04]
[2.32783407e-01 4.53738511e-01 2.31750652e-01 8.17274228e-02]]
我已经按原样复制了输出。
【问题讨论】:
【参考方案1】:cross_val_score
不是这里合适的工具;您应该手动控制您的 CV 程序。以下是如何结合我在您链接的 SO 线程以及 Cross-validation metrics in scikit-learn for each data split 中的回答的各个方面,并使用准确性作为示例指标:
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score
import numpy as np
n_splits = 10
kf = KFold(n_splits=n_splits, shuffle=True)
cv_acc = []
# prepare a single-digit copy of your 1-hot encoded true labels:
y_single = np.argmax(y, axis=1)
for train_index, val_index in kf.split(x):
# fit & predict
model = KerasClassifier(build_fn=build_model, batch_size=10, epochs=ep)
model.fit(x[train_index], y[train_index])
pred = model.predict_classes(x[val_index]) # predicts single-digit classes
# get fold accuracy & append
fold_acc = accuracy_score(y_single[val_index], pred)
cv_acc.append(fold_acc)
acc = mean(cv_acc)
循环完成后,您将获得列表cv_acc
中每个折叠的准确度,取平均值将为您提供平均值。
这样,您就不需要用于精度、召回率和 f1 的自定义定义;您可以只使用 scikit-learn 中的相应内容。您可以在循环中添加任意数量的不同指标(cross_cal_score
无法做到这一点),只要您从 scikit-learn 适当地导入它们,就像这里使用 accuracy_score
所做的那样。
【讨论】:
【参考方案2】:我一直在尝试@desertnaut 的答案,但是因为我有一个多类问题,我遇到的问题甚至不是循环本身而是np.argmax()
行。谷歌搜索后,我没有找到任何轻松解决它的方法,所以我最终(根据该用户的建议)手动实施 CV。它有点复杂,因为我使用的是 pandas 数据框(代码肯定可以进一步清理),但这里是工作代码:
ep = 120
df_split = np.array_split(df, 10)
test_part = 0
acc = []
f1 = []
prec = []
recalls = []
while test_part < 10:
model = build_model()
train_x = []
train_y = []
test_x = []
test_y = []
print("CV Fold, with test partition i = " , test_part)
for i in range(10):
#on first iter that isnt a test part then set the train set to this
if len(train_x) == 0 and not i == test_part:
train_x = df_split[i][['start-sin', 'start-cos', 'start-sin-lag', 'start-cos-lag', 'prev-close-sin', 'prev-close-cos', 'prev-length', 'state-lag', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']]
train_y = df_split[i][['wait-categ-none', 'wait-categ-short', 'wait-categ-medium', 'wait-categ-long']]
#terminate immediately
continue
#if current is not a test partition then concat with previous version
if not i == test_part:
train_x = pd.concat([train_x, df_split[i][['start-sin', 'start-cos', 'start-sin-lag', 'start-cos-lag', 'prev-close-sin', 'prev-close-cos', 'prev-length', 'state-lag', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']]], axis=0)
train_y = pd.concat([train_y, df_split[i][['wait-categ-none', 'wait-categ-short', 'wait-categ-medium', 'wait-categ-long']]], axis=0)
#set this to test partition
else:
test_x = df_split[i][['start-sin', 'start-cos', 'start-sin-lag', 'start-cos-lag', 'prev-close-sin', 'prev-close-cos', 'prev-length', 'state-lag', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']]
test_y = df_split[i][['wait-categ-none', 'wait-categ-short', 'wait-categ-medium', 'wait-categ-long']]
#enforce
train_y = train_y.replace(False, 0)
train_y = train_y.replace(True, 1)
test_y = test_y.replace(False, 0)
test_y = test_y.replace(True, 1)
#fit
model.fit(train_x, train_y, epochs=ep, verbose=1)
pred = model.predict(test_x)
#score
loss, accuracy, f1_score, precision, recall = model.evaluate(test_x, test_y, verbose=0)
#save
acc.append(accuracy)
f1.append(f1_score)
prec.append(precision)
recalls.append(recall)
test_part += 1
print("CV finished.\n")
print("Mean Accuracy")
print(sum(acc)/len(acc))
print("Mean F1 score")
print(sum(f1)/len(f1))
print("Mean Precision")
print(sum(prec)/len(prec))
print("Mean Recall rate")
print(sum(recalls)/len(recalls))
【讨论】:
【参考方案3】:对于仍想将cross_validate
与 one-hot 编码标签一起使用的任何人。这是一种更加面向 scikit 的方式。
X, y = get_data()
# in my application I have words as labels, so y is a np.array with strings
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)
# build a version of the scoring metrics for multi-class and one-hot encoding predictions
labels = sorted(set(np.unique(y_encoded)) - set(encoder.transform(['nan'])))
# these functions compare y (one-hot encoded) to y_pred (integer encoded)
# by making y integer encoded as well
def f1_categorical(y, y_pred, **kwargs):
return f1_score(y.argmax(1), y_pred, **kwargs)
def precision_categorical(y, y_pred, **kwargs):
return precision_score(y.argmax(1), y_pred, **kwargs)
def recall_categorical(y, y_pred, **kwargs):
return recall_score(y.argmax(1), y_pred, **kwargs)
def accuracy_categorical(y, y_pred, **kwargs):
return accuracy_score(y.argmax(1), y_pred, **kwargs)
# Wrap the functions abobe with `make_scorer`
# (here I chose the micro average because it worked for my multi-class application)
our_f1 = make_scorer(f1_categorical, labels=labels, average="micro")
our_precision = make_scorer(precision_categorical, labels=labels, average="micro")
our_recall = make_scorer(recall_categorical, labels=labels, average="micro")
our_accuracy = make_scorer(accuracy_categorical)
scoring =
'accuracy':our_accuracy,
'f1':our_f1,
'precision':our_precision,
'recall':our_recall
# one-hot encoding
y_categorical = tf.keras.utils.to_categorical(y_encoded)
# keras wrapper
estimator = tf.keras.wrappers.scikit_learn.KerasClassifier(
build_fn=model_with_one_hot_encoded_output,
epochs=1,
batch_size=32,
verbose=1)
# cross validate as usual
results = cross_validate(estimator,
X_scaled, y_categorical,
scoring=scoring)
【讨论】:
以上是关于Keras scikit-learn 包装器在使用 one-hot 编码标签的交叉验证中的评分指标的主要内容,如果未能解决你的问题,请参考以下文章
Scikit-Learn 包装器和 RandomizedSearchCV:RuntimeError
如何在交叉验证中获得 Keras scikit-learn 包装器的训练和验证损失?
从磁盘加载包含预训练 Keras 模型的 scikit-learn 管道