使用 keras 在虹膜上的 GridSearchCV 结果不佳
Posted
技术标签:
【中文标题】使用 keras 在虹膜上的 GridSearchCV 结果不佳【英文标题】:poor GridSearchCV results with keras on iris 【发布时间】:2018-04-21 16:29:56 【问题描述】:我试图在Iris data 的分类上使用 keras 探索 GridSearchCV 功能。网格搜索是关于 batch_size 和 epochs 的。但是,我对结果的准确性感到惊讶,但找不到原因。非常感谢您的帮助!
代码和输出都附在这里。
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.wrappers.scikit_learn import KerasClassifier
import numpy
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import GridSearchCV
# Function to create model, required for KerasClassifier
def create_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
dataframe = pd.read_csv("iris.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [5, 10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100, 200]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, dummy_y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
Using TensorFlow backend.
Best: 0.666667 using 'batch_size': 100, 'epochs': 10
0.000000 (0.000000) with: 'batch_size': 5, 'epochs': 10
0.000000 (0.000000) with: 'batch_size': 5, 'epochs': 50
0.000000 (0.000000) with: 'batch_size': 5, 'epochs': 100
0.000000 (0.000000) with: 'batch_size': 5, 'epochs': 200
0.000000 (0.000000) with: 'batch_size': 10, 'epochs': 10
0.000000 (0.000000) with: 'batch_size': 10, 'epochs': 50
0.000000 (0.000000) with: 'batch_size': 10, 'epochs': 100
0.000000 (0.000000) with: 'batch_size': 10, 'epochs': 200
0.006667 (0.009428) with: 'batch_size': 20, 'epochs': 10
0.000000 (0.000000) with: 'batch_size': 20, 'epochs': 50
0.000000 (0.000000) with: 'batch_size': 20, 'epochs': 100
0.000000 (0.000000) with: 'batch_size': 20, 'epochs': 200
0.333333 (0.471405) with: 'batch_size': 40, 'epochs': 10
0.000000 (0.000000) with: 'batch_size': 40, 'epochs': 50
0.000000 (0.000000) with: 'batch_size': 40, 'epochs': 100
0.000000 (0.000000) with: 'batch_size': 40, 'epochs': 200
0.006667 (0.009428) with: 'batch_size': 60, 'epochs': 10
0.013333 (0.018856) with: 'batch_size': 60, 'epochs': 50
0.000000 (0.000000) with: 'batch_size': 60, 'epochs': 100
0.000000 (0.000000) with: 'batch_size': 60, 'epochs': 200
0.000000 (0.000000) with: 'batch_size': 80, 'epochs': 10
0.000000 (0.000000) with: 'batch_size': 80, 'epochs': 50
0.000000 (0.000000) with: 'batch_size': 80, 'epochs': 100
0.000000 (0.000000) with: 'batch_size': 80, 'epochs': 200
0.666667 (0.471405) with: 'batch_size': 100, 'epochs': 10
0.000000 (0.000000) with: 'batch_size': 100, 'epochs': 50
0.040000 (0.056569) with: 'batch_size': 100, 'epochs': 100
0.000000 (0.000000) with: 'batch_size': 100, 'epochs': 200
【问题讨论】:
【参考方案1】:尝试添加以下行:
from sklearn.metrics import shuffle
X, Y = shuffle(X, Y)
这种奇怪行为背后的原因是因为您的数据没有被打乱 - 每次(在 3 折交叉验证中)您的数据都以这样的方式拆分,即火车集中只有两个类和第三类仅在测试折叠中。阅读here了解更详细的解释。
【讨论】:
谢谢!它就像一个魅力。从 sklearn.utils 导入随机播放以上是关于使用 keras 在虹膜上的 GridSearchCV 结果不佳的主要内容,如果未能解决你的问题,请参考以下文章
Keras KerasClassifier gridsearch TypeError: can't pickle _thread.lock objects