GridseachCV - ValueError:发现样本数量不一致的输入变量:[33 1]
Posted
技术标签:
【中文标题】GridseachCV - ValueError:发现样本数量不一致的输入变量:[33 1]【英文标题】:GridseachCV - ValueError: Found input variables with inconsistent numbers of samples: [33 1] 【发布时间】:2017-09-27 21:25:20 【问题描述】:我正在尝试在我的 keras 模型上使用 gridsearchCV,但似乎遇到了一个我不知道如何解释的错误。
Traceback (most recent call last):
File "keras_cnn_phoneme_generator_fit.py", line 229, in <module>
grid_results=grid.fit(train_input,train_output)
File "/home/c/.local/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 940, in fit
return self._fit(X, y, groups, ParameterGrid(self.param_grid))
File "/home/c/.local/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 541, in _fit
X, y, groups = indexable(X, y, groups)
File "/home/c/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 206, in indexable
check_consistent_length(*result)
File "/home/c/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [33, 1]
这是模型以及我如何应用它。
def model3(kernel_number = 200, kernel_shape = (window_height,3)):
#stride = 1
#dim = 40
#window_height = 8
#splits = ((40-8)+1)/1 = 33
#next(test_generator())
#next(train_generator(batch_size))
#kernel_number = 200
list_of_input = [Input(shape = (window_height,total_frames_with_deltas,3)) for i in range(splits)]
list_of_conv_output = []
list_of_max_out = []
for i in range(splits):
if splits == 1:
list_of_conv_output.append(Conv2D(filters = kernel_number , kernel_size = kernel_shape, activation = 'relu')(list_of_input[i]))
list_of_max_out.append((MaxPooling2D(pool_size=((1,11)))(list_of_conv_output[i])))
else:
list_of_conv_output.append(Conv2D(filters = 200 , kernel_size = (window_height,3) , activation = 'relu')(list_of_input[i]))
list_of_max_out.append((MaxPooling2D(pool_size=((1,11)))(list_of_conv_output[i])))
merge = keras.layers.concatenate(list_of_max_out)
print merge.shape
reshape = Reshape((total_frames/total_frames,-1))(merge)
dense1 = Dense(units = 1000, activation = 'relu', name = "dense_1")(reshape)
dense2 = Dense(units = 1000, activation = 'relu', name = "dense_2")(dense1)
dense3 = Dense(units = 145 , activation = 'softmax', name = "dense_3")(dense2)
model = Model(inputs = list_of_input , outputs = dense3)
model.compile(loss="categorical_crossentropy", optimizer="SGD" , metrics = [metrics.categorical_accuracy])
reduce_lr=ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1, mode='auto', epsilon=0.001, cooldown=0)
stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=1, mode='auto')
log=csv_logger = CSVLogger('/home/c/kaldi-trunk/dnn/training_'+str(total_frames)+"_"+str(dim)+"_"+str(window_height)+"_"+str(batch_size)+".csv")
checkpoint = ModelCheckpoint(filepath="/media/c/E2302E68302E443F/Timit-dataset/timit/fbank/nn/"+str(total_frames)+"_"+str(dim)+"_"+str(window_height)+"_"+str(batch_size)+".hdf5",save_best_only=True)
if len(sys.argv) == 7:
model.load_weigts(weights)
print model.summary()
#raw_input("okay?")
#hist_current = model.fit_generator(train_generator(batch_size),
# steps_per_epoch=10,
# epochs = 100000,
# verbose = 1,
# validation_data = test_generator(),
# validation_steps=1,
# pickle_safe = True,
# workers = 4,
# callbacks = [log,checkpoint])
return model
#model3()
model = KerasClassifier(build_fn=model3,epochs = 10,batch_size = 1,verbose=1)
kernel_number = [10,50,100,150,200,250]
kernel_shape = [(window_height,3),(window_height,5),(window_height,8)]
param_grid = dict(kernel_number = kernel_number , kernel_shape=kernel_shape)
grid = GridSearchCV(estimator=model, param_grid=param_grid)
train_input,train_output = next(train_generator(1))
grid_results=grid.fit(train_input,train_output)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
该模型有多个输入,总共 33 个。这些输入由 data_generator 给出,如果形状为 (batch_size, 1, 40,8,3),它会输出一个长度为 33 的列表,带有 numpy.arrays。问题可能是它无法处理列表吗?或者我为什么会收到这个错误?
对于 batch_size = 100
print len(train_input)
print train_input[0].shape
print train_output.shape
33
(100, 8, 45, 3)
(100, 1, 145)
【问题讨论】:
提供给gridsearchfit()
的train_input
,train_output
的形状是什么?
train_input
是一个长度为 33 的列表,其形状为 numpy.ndarray (batch_size,1,8,45,3) train_output
是一个 numpy.ndarray (1,145) 为什么要问一个列表?.. 我的模型有 33 个输入,我可以喂它们的唯一方法就是这样。
你能在grid_results=grid.fit(train_input,train_output)
这行之前打印train_input, train_output
的实际形状吗?
为什么train_output
的长度是(1, 145) 即第一个维度是1(表示1 行)而train_input
有33 行?大多数 scikit(可能是所有)估计器仅支持形状为 [n_samples, n_features] 的 X (train_input
) 的二维数组。
@VivekKumar 添加了信息。从 145 个可能的标签中预测出 1 个标签。
【参考方案1】:
documentation 声明:
您可以通过 keras.wrappers.scikit_learn.py 中的包装器将顺序 Keras 模型(仅限单输入)用作 Scikit-Learn 工作流程的一部分。
所以,这是不可能的。
我想必须找到不同的解决方案。
【讨论】:
这里是在 GridSearch CV 中传递多个输入的解决方案:***.com/questions/56824968/…以上是关于GridseachCV - ValueError:发现样本数量不一致的输入变量:[33 1]的主要内容,如果未能解决你的问题,请参考以下文章
如何解决 raise ValueError("columns must have matching element counts") ValueError: columns mus