ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable(
Posted Data+Science+Insight
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable(相关的知识,希望对你有一定的参考价值。
ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable to refit an estimator with the best parameter setting on the whole data and make the best_* attributes available for that metric. If this is not needed, refit should be set to False explicitly. True was passed.
问题:
因为当评估指标有多个的时候,模型不知道自己在refit的时候应该依据哪一个所以需要人为的进行指定才可以。
clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring)
import numpy as np
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import log_loss, make_scorer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
iris = datasets.load_iris()
X = iris.data
# 将原始数据的类别处理为二分类问题,原始类别为0,1,2,现在为0,1
y = np.where(iris.target==0,0,1)
# 数据划分
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42,shuffle=True, stratify=y)
# 优化搜索函数;
alphas = np.logspace(1, 10, 100, base = 10)
parameters = {'C':[1, 10],'solver':('liblinear','saga')}
# parameters = {'C':alphas}
# 构建logisitic回归模型,选择L1正则化,
log_lr = linear_model.LogisticRegression(penalty='l1',max_iter=1e5,solver = 'liblinear')
# 构建logit损失函数;
LogLoss = make_scorer(log_loss, greater_is_better=False, needs_proba=True)
# GridSearchCV
scoring = {'AUC': 'roc_auc', 'LogLoss': LogLoss}
# clf = GridSearchCV(log_lr, parameters, cv=5, scoring=LogLoss)
# clf = GridSearchCV(log_lr, parameters, cv=5)
clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring)
# clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring,refit='AUC')
# 模型拟合
clf.fit(X_train, y_train)
print(clf.best_score_, clf.best_estimator_)
iris_model = clf.best_estimator_
# 查看 classification report
print('---------------classification report-------------------')
y_pred = iris_model.predict(X_test)
print(classification_report(y_test, y_pred))
解决:
clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring,refit='AUC')
import numpy as np
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import log_loss, make_scorer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
iris = datasets.load_iris()
X = iris.data
# 将原始数据的类别处理为二分类问题,原始类别为0,1,2,现在为0,1
y = np.where(iris.target==0,0,1)
# 数据划分
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42,shuffle=True, stratify=y)
# 优化搜索函数;
alphas = np.logspace(1, 10, 100, base = 10)
parameters = {'C':[1, 10],'solver':('liblinear','saga')}
# parameters = {'C':alphas}
# 构建logisitic回归模型,选择L1正则化,
log_lr = linear_model.LogisticRegression(penalty='l1',max_iter=1e5,solver = 'liblinear')
# 构建logit损失函数;
LogLoss = make_scorer(log_loss, greater_is_better=False, needs_proba=True)
# GridSearchCV
scoring = {'AUC': 'roc_auc', 'LogLoss': LogLoss}
# clf = GridSearchCV(log_lr, parameters, cv=5, scoring=LogLoss)
# clf = GridSearchCV(log_lr, parameters, cv=5)
# clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring)
clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring,refit='AUC')
# 模型拟合
clf.fit(X_train, y_train)
print(clf.best_score_, clf.best_estimator_)
iris_model = clf.best_estimator_
# 查看 classification report
print('---------------classification report-------------------')
y_pred = iris_model.predict(X_test)
print(classification_report(y_test, y_pred))
完整错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-70-e7a5d74dc020> in <module>
27 clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring)
28 # 模型拟合
---> 29 clf.fit(X_train, y_train)
30 print(clf.best_score_, clf.best_estimator_)
31 iris_model = clf.best_estimator_
D:\\anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
D:\\anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py in fit(self, X, y, groups, **fit_params)
754 else:
755 scorers = _check_multimetric_scoring(self.estimator, self.scoring)
--> 756 self._check_refit_for_multimetric(scorers)
757 refit_metric = self.refit
758
D:\\anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py in _check_refit_for_multimetric(self, scores)
719 if (self.refit is not False and not valid_refit_dict
720 and not callable(self.refit)):
--> 721 raise ValueError(multimetric_refit_msg)
722
723 @_deprecate_positional_args
ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable to refit an estimator with the best parameter setting on the whole data and make the best_* attributes available for that metric. If this is not needed, refit should be set to False explicitly. True was passed.
On GridSearchCV's doc, refit
is defined as:
refit : boolean, string, or callable, default=True
Refit an estimator using the best found parameters on the whole dataset. For multiple metric evaluation, this needs to be a string denoting the scorer that would be used to find the best parameters for refitting the estimator at the end. Where there are considerations other than maximum score in choosing a best estimator, refit can be set to a function which returns the selected best_index_ given cv_results_. The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this GridSearchCV instance. Also for multiple metric evaluation, the attributes best_index_, best_score_ and best_params_ will only be available if refit is set and all of them will be determined w.r.t this specific scorer. best_score_ is not returned if refit is callable. See scoring parameter to know more about multiple metric evaluation.
If you don't want to refit the estimator, you can set refit=False
(as boolean). On the other hand, to refit the estimator with one of the scorer, you can do refit='precision_score'
for example.
参考:How to fix the error “For multi-metric scoring” for OneClassSVM and GridSearchCV
参考:GridSearchCV
以上是关于ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable(的主要内容,如果未能解决你的问题,请参考以下文章
在 Django 迁移期间收到“ValueError: Found wrong number of (0) of constraint for ...”
使用 for 循环拆分 CSV 数据并打印一个变量。 ValueError:需要多于 1 个值才能解压?
熊猫:groupby 的问题。错误:'ValueError: Grouper for <something> not 1-dimensional'
ValueError: len(index) != len(labels) for groupby pandas
Python xlrd.open_workbook 生成错误:ValueError: invalid literal for int() with base 10: '
Python中ValueError: invalid literal for int() with base 10 的实用解决办法