GridseachCV - ValueError:发现样本数量不一致的输入变量:[33 1]

Posted

技术标签:

【中文标题】GridseachCV - ValueError:发现样本数量不一致的输入变量:[33 1]【英文标题】:GridseachCV - ValueError: Found input variables with inconsistent numbers of samples: [33 1] 【发布时间】:2017-09-27 21:25:20 【问题描述】:

我正在尝试在我的 keras 模型上使用 gridsearchCV,但似乎遇到了一个我不知道如何解释的错误。

Traceback (most recent call last):
  File "keras_cnn_phoneme_generator_fit.py", line 229, in <module>
    grid_results=grid.fit(train_input,train_output)
  File "/home/c/.local/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 940, in fit
    return self._fit(X, y, groups, ParameterGrid(self.param_grid))
  File "/home/c/.local/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 541, in _fit
    X, y, groups = indexable(X, y, groups)
  File "/home/c/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 206, in indexable
    check_consistent_length(*result)
  File "/home/c/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [33, 1]

这是模型以及我如何应用它。

def model3(kernel_number = 200, kernel_shape = (window_height,3)):
    #stride = 1
    #dim = 40
    #window_height = 8
    #splits = ((40-8)+1)/1 = 33
    #next(test_generator())
    #next(train_generator(batch_size))

    #kernel_number = 200
    list_of_input = [Input(shape = (window_height,total_frames_with_deltas,3)) for i in range(splits)]
    list_of_conv_output = []
    list_of_max_out = []
    for i in range(splits):
        if splits == 1:
            list_of_conv_output.append(Conv2D(filters = kernel_number , kernel_size = kernel_shape, activation = 'relu')(list_of_input[i]))
            list_of_max_out.append((MaxPooling2D(pool_size=((1,11)))(list_of_conv_output[i])))
        else:
            list_of_conv_output.append(Conv2D(filters = 200 , kernel_size = (window_height,3) , activation = 'relu')(list_of_input[i]))
            list_of_max_out.append((MaxPooling2D(pool_size=((1,11)))(list_of_conv_output[i])))

    merge = keras.layers.concatenate(list_of_max_out)
    print merge.shape
    reshape = Reshape((total_frames/total_frames,-1))(merge)

    dense1 = Dense(units = 1000, activation = 'relu',    name = "dense_1")(reshape)
    dense2 = Dense(units = 1000, activation = 'relu',    name = "dense_2")(dense1)
    dense3 = Dense(units = 145 , activation = 'softmax', name = "dense_3")(dense2)


    model = Model(inputs = list_of_input , outputs = dense3)
    model.compile(loss="categorical_crossentropy", optimizer="SGD" , metrics = [metrics.categorical_accuracy])

    reduce_lr=ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1, mode='auto', epsilon=0.001, cooldown=0)
    stop  = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=1, mode='auto')
    log=csv_logger = CSVLogger('/home/c/kaldi-trunk/dnn/training_'+str(total_frames)+"_"+str(dim)+"_"+str(window_height)+"_"+str(batch_size)+".csv")
    checkpoint = ModelCheckpoint(filepath="/media/c/E2302E68302E443F/Timit-dataset/timit/fbank/nn/"+str(total_frames)+"_"+str(dim)+"_"+str(window_height)+"_"+str(batch_size)+".hdf5",save_best_only=True)

    if len(sys.argv) == 7:
        model.load_weigts(weights)

    print model.summary()

    #raw_input("okay?")
    #hist_current = model.fit_generator(train_generator(batch_size),
    #                    steps_per_epoch=10,
    #                    epochs = 100000,
    #                    verbose = 1,
    #                    validation_data = test_generator(),
    #                    validation_steps=1,
    #                    pickle_safe = True,
    #                    workers = 4,
    #                    callbacks = [log,checkpoint])
    return model


#model3()

model = KerasClassifier(build_fn=model3,epochs = 10,batch_size = 1,verbose=1)
kernel_number = [10,50,100,150,200,250]
kernel_shape = [(window_height,3),(window_height,5),(window_height,8)]
param_grid = dict(kernel_number = kernel_number , kernel_shape=kernel_shape)
grid = GridSearchCV(estimator=model, param_grid=param_grid)
train_input,train_output = next(train_generator(1))
grid_results=grid.fit(train_input,train_output)

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

该模型有多个输入,总共 33 个。这些输入由 data_generator 给出,如果形状为 (batch_size, 1, 40,8,3),它会输出一个长度为 33 的列表,带有 numpy.arrays。问题可能是它无法处理列表吗?或者我为什么会收到这个错误?

对于 batch_size = 100

print len(train_input)
print train_input[0].shape
print train_output.shape

33
(100, 8, 45, 3)
(100, 1, 145)

【问题讨论】:

提供给gridsearch fit()train_input,train_output的形状是什么? train_input 是一个长度为 33 的列表,其形状为 numpy.ndarray (batch_size,1,8,45,3) train_output 是一个 numpy.ndarray (1,145) 为什么要问一个列表?.. 我的模型有 33 个输入,我可以喂它们的唯一方法就是这样。 你能在grid_results=grid.fit(train_input,train_output)这行之前打印train_input, train_output的实际形状吗? 为什么train_output 的长度是(1, 145) 即第一个维度是1(表示1 行)而train_input 有33 行?大多数 scikit(可能是所有)估计器仅支持形状为 [n_samples, n_features] 的 X (train_input) 的二维数组。 @VivekKumar 添加了信息。从 145 个可能的标签中预测出 1 个标签。 【参考方案1】:

documentation 声明:

您可以通过 keras.wrappers.scikit_learn.py 中的包装器将顺序 Keras 模型(仅限单输入)用作 Scikit-Learn 工作流程的一部分。

所以,这是不可能的。

我想必须找到不同的解决方案。

【讨论】:

这里是在 GridSearch CV 中传递多个输入的解决方案:***.com/questions/56824968/…

以上是关于GridseachCV - ValueError:发现样本数量不一致的输入变量:[33 1]的主要内容,如果未能解决你的问题,请参考以下文章

ValueError: '对象对于所需数组来说太深'

ValueError:不支持多类格式

如何解决 raise ValueError("columns must have matching element counts") ValueError: columns mus

“ValueError:标签 ['timestamp'] 不包含在轴中”错误

ValueError:不支持连续[重复]

django:ValueError - 无法序列化