GridSearchCV - FitFailedWarning:估计器拟合失败

Posted

技术标签:

【中文标题】GridSearchCV - FitFailedWarning:估计器拟合失败【英文标题】:GridSearchCV - FitFailedWarning: Estimator fit failed 【发布时间】:2020-06-25 12:08:51 【问题描述】:

我正在运行这个:

# Hyperparameter tuning - Random Forest #

# Hyperparameters' grid
parameters = 'n_estimators': list(range(100, 250, 25)), 'criterion': ['gini', 'entropy'], 
              'max_depth': list(range(2, 11, 2)), 'max_features': [0.1, 0.2, 0.3, 0.4, 0.5], 
              'class_weight': [0: 1, 1: i for i in np.arange(1, 4, 0.2).tolist()], 'min_samples_split': list(range(2, 7))


# Instantiate random forest
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(random_state=0)


# Execute grid search and retrieve the best classifier
from sklearn.model_selection import GridSearchCV
classifiers_grid = GridSearchCV(estimator=classifier, param_grid=parameters, scoring='balanced_accuracy',
                                   cv=5, refit=True, n_jobs=-1)
classifiers_grid.fit(X, y)

我收到了这个警告:

.../anaconda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:536: 
FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
TypeError: '<' not supported between instances of 'str' and 'int'

为什么会这样,我该如何解决?

【问题讨论】:

我在使用随机森林进行特征选择时遇到了类似的错误。我将 int 变量的类型更改为字符串(因为我有一个 int 和所有其他字符串变量)并且错误已修复。 【参考方案1】:

我有类似的FitFailedWarning问题,细节不同,经过多次运行发现,参数值传递有错误,尝试

parameters = 'n_estimators': [100,125,150,175,200,225,250], 
              'criterion': ['gini', 'entropy'], 
              'max_depth': [2,4,6,8,10], 
              'max_features': [0.1, 0.2, 0.3, 0.4, 0.5], 
              'class_weight': [0.2,0.4,0.6,0.8,1.0],               
              'min_samples_split': [2,3,4,5,6,7]

这肯定会通过,对我来说它发生在 XGBClassifier 中,不知何故值数据类型混淆了

还有一个是如果值超出范围,例如XGBClassifier 'subsample' 参数中最大值为1.0,如果设置为1.1,会出现FitFailedWarning

【讨论】:

【参考方案2】:

对我来说,这给出了同样的错误,但在从 max_dept 中删除一个错误后,它是正确的。

param_grid='n_estimators':[100,200,300,400,500],
            'criterion':['gini', 'entropy'],
            'max_depth':['None',5,10,20,30,40,50,60,70],
            'min_samples_split':[5,10,20,25,30,40,50],
            'max_features':[ 'sqrt', 'log2'],
            'max_leaf_nodes':[5,10,20,25,30,40,50],
            'min_samples_leaf':[1,100,200,300,400,500]
            

运行正常的代码:

param_grid='n_estimators':[100,200,300,400,500],
            'criterion':['gini', 'entropy'],
            'max_depth':[5,10,20,30,40,50,60,70],
            'min_samples_split':[5,10,20,25,30,40,50],
            'max_features':[ 'sqrt', 'log2'],
            'max_leaf_nodes':[5,10,20,25,30,40,50],
            'min_samples_leaf':[1,100,200,300,400,500]
            

【讨论】:

【参考方案3】:

我也遇到了同样的错误,当我像在 MachineLearningMastery 中一样通过超参数时,我得到了没有警告的输出...

如果有人遇到类似问题,请尝试这种方式...

# grid search logistic regression model on the sonar dataset
from pandas import read_csv
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# define model
model = LogisticRegression()
# define evaluation
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define search space
space = dict()
space['solver'] = ['newton-cg', 'lbfgs', 'liblinear']
space['penalty'] = ['none', 'l1', 'l2', 'elasticnet']
space['C'] = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100]
# define search
search = GridSearchCV(model, space, scoring='accuracy', n_jobs=-1, cv=cv)
# execute search
result = search.fit(X, y)
# summarize result
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)

【讨论】:

【参考方案4】:

确保 y 变量是 int,而不是 boolstr

更改最后一行代码,使 y 系列为 0 或 1,例如:

classifiers_grid.fit(X, list(map(int, y)))

【讨论】:

以上是关于GridSearchCV - FitFailedWarning:估计器拟合失败的主要内容,如果未能解决你的问题,请参考以下文章

Gridsearchcv:内部逻辑

如何从 gridsearchcv 绘制决策树?

我正在使用 GridSearchCV 训练一个 Ann 机器学习模型,但在 gridSearchCV 中遇到了 IndexError

GridSearchCV 是不是存储所有参数组合的所有分数?

带有 RandomForest 的 GridsearchCV

用于多个模型的 GridSearchCV