用于 Scikit Learn 的 Keras Wrappers - AUC 记分器不工作
Posted
技术标签:
【中文标题】用于 Scikit Learn 的 Keras Wrappers - AUC 记分器不工作【英文标题】:Keras Wrappers for Scikit Learn - AUC scorer is not working 【发布时间】:2016-09-28 04:16:46 【问题描述】:我正在尝试使用Keras Scikit Learn Wrapper,以便更轻松地随机搜索参数。我在这里写了一个示例代码:
-
我生成了一个人工数据集:
我正在使用来自scikit learn
的moons
from sklearn.datasets import make_moons
dataset = make_moons(1000)
-
模型构建器定义:
我定义了build_fn
需要的函数:
def build_fn(nr_of_layers = 2,
first_layer_size = 10,
layers_slope_coeff = 0.8,
dropout = 0.5,
activation = "relu",
weight_l2 = 0.01,
act_l2 = 0.01,
input_dim = 2):
result_model = Sequential()
result_model.add(Dense(first_layer_size,
input_dim = input_dim,
activation=activation,
W_regularizer= l2(weight_l2),
activity_regularizer=activity_l2(act_l2)
))
current_layer_size = int(first_layer_size * layers_slope_coeff) + 1
for index_of_layer in range(nr_of_layers - 1):
result_model.add(BatchNormalization())
result_model.add(Dropout(dropout))
result_model.add(Dense(current_layer_size,
W_regularizer= l2(weight_l2),
activation=activation,
activity_regularizer=activity_l2(act_l2)
))
current_layer_size = int(current_layer_size * layers_slope_coeff) + 1
result_model.add(Dense(1,
activation = "sigmoid",
W_regularizer = l2(weight_l2)))
result_model.compile(optimizer="rmsprop", metrics = ["accuracy"], loss = "binary_crossentropy")
return result_model
NeuralNet = KerasClassifier(build_fn)
-
参数网格定义:
然后我定义了一个参数网格:
param_grid =
"nr_of_layers" : [2, 3, 4, 5],
"first_layer_size" : [5, 10, 15],
"layers_slope_coeff" : [0.4, 0.6, 0.8],
"dropout" : [0.3, 0.5, 0.8],
"weight_l2" : [0.01, 0.001, 0.0001],
"verbose" : [0],
"batch_size" : [1],
"nb_epoch" : [30]
-
RandomizedSearchCV 阶段:
我定义了RandomizedSearchCV
对象并拟合了来自人工数据集的值:
random_search = RandomizedSearchCV(NeuralNet,
param_distributions=param_grid, verbose=2, n_iter=1, scoring="roc_auc")
random_search.fit(dataset[0], dataset[1])
我得到的(在控制台中运行此代码后)是:
Traceback (most recent call last):
File "C:\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-c5bdbc2770b7>", line 2, in <module>
random_search.fit(dataset[0], dataset[1])
File "C:\Anaconda2\lib\site-packages\sklearn\grid_search.py", line 996, in fit
return self._fit(X, y, sampled_params)
File "C:\Anaconda2\lib\site-packages\sklearn\grid_search.py", line 553, in _fit
for parameters in parameter_iterable
File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 800, in __call__
while self.dispatch_one_batch(iterator):
File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 658, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 566, in _dispatch
job = ImmediateComputeBatch(batch)
File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 180, in __init__
self.results = batch()
File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "C:\Anaconda2\lib\site-packages\sklearn\cross_validation.py", line 1550, in _fit_and_score
test_score = _score(estimator, X_test, y_test, scorer)
File "C:\Anaconda2\lib\site-packages\sklearn\cross_validation.py", line 1606, in _score
score = scorer(estimator, X_test, y_test)
File "C:\Anaconda2\lib\site-packages\sklearn\metrics\scorer.py", line 175, in __call__
y_pred = y_pred[:, 1]
IndexError: index 1 is out of bounds for axis 1 with size 1
当我使用 accuracy
度量而不是使用 scoring = "roc_auc"
时,此代码可以正常工作。谁能解释我怎么了?有没有人有类似的问题?
【问题讨论】:
【参考方案1】:KerasClassifier 中存在导致此问题的错误。我在 repo 上为它打开了一个问题。 https://github.com/fchollet/keras/issues/2864
修复也在那里。您可以同时定义自己的 KerasClassifier 作为临时解决方法。
class FixedKerasClassifier(KerasClassifier):
def predict_proba(self, X, **kwargs):
kwargs = self.filter_sk_params(Sequential.predict_proba, kwargs)
probs = self.model.predict_proba(X, **kwargs)
if(probs.shape[1] == 1):
probs = np.hstack([1-probs,probs])
return probs
【讨论】:
非常感谢您 - 您的解决方案做得很好。干杯:) 优秀的解决方案!以上是关于用于 Scikit Learn 的 Keras Wrappers - AUC 记分器不工作的主要内容,如果未能解决你的问题,请参考以下文章
从磁盘加载包含预训练 Keras 模型的 scikit-learn 管道
keras/scikit-learn:使用 fit_generator() 进行交叉验证
Keras 和 scikit-learn 的 MLP 结果完全不同
使用 scikit-learn 对具有多个输入的 Keras 模型进行交叉验证
无法克隆对象 <tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier 对象