如何解决：“FitFailedWarning：估计器拟合失败。这些参数的训练测试分区上的分数将设置为 nan？”

Posted 2023-03-12

技术标签:

【中文标题】如何解决：“FitFailedWarning：估计器拟合失败。这些参数的训练测试分区上的分数将设置为 nan？”【英文标题】：How do I fix: "FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan?" 【发布时间】：2021-10-30 10:36:13 【问题描述】：

from sklearn.model_selection import GridSearchCV, KFold

param_grid = 'select__k': np.arange(1, data_x_numeric.shape[1] + 1)
cv = KFold(n_splits=3, random_state=1, shuffle=True)
gcv = GridSearchCV(pipe, param_grid, return_train_score=True, cv=cv)
gcv.fit(data_x, data_y)

results = pd.DataFrame(gcv.cv_results_).sort_values(by='mean_test_score', ascending=False)
results.loc[:, ~results.columns.str.endswith("_time")]

运行上述代码后，我收到一条警告，提示估计器拟合失败。

FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
line 598, in _fit_and_score estimator.fit(X_train, y_train, **fit_params)
"pipeline.py," line 341, in fit Xt = self._fit(X, y, **fit_params_steps) "pipeline.py," line 303, in _fit X, fitted_transformer = fit_transform_one_cached(
"memory.py," line 352, in __call__ return self.func(*args, **kwargs) "pipeline.py," line 754, in _fit_transform_one res = transformer.fit_transform(X, y, **fit_params)
"base.py," line 702, in fit_transform return self.fit(X, y, **fit_params).transform(X)
univariate_selection.py, line 353, in fit score_func_ret = self.score_func(X, y)
"<ipython-input-413-f8e48283bbee>," line 7, in fit_and_score_features
    m.fit(Xj, y)
"coxph.py" line 426, in fit delta = solve(optimizer.hessian, optimizer.gradient,
"basic.py," line 214, in solve _solve_check(n, info)
"basic.py," line 29, in _solve_check raise LinAlgError('Matrix is singular.')
numpy.linalg.LinAlgError: Matrix is singular.

  warnings.warn("Estimator fit failed. The score on this train-test"
"categorical.py:2630": FutureWarning: The `inplace` parameter in pandas.Categorical.set_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  res = method(*args, **kwargs)
"categorical.py:2630": FutureWarning: The `inplace` parameter in pandas.Categorical.set_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  res = method(*args, **kwargs)
"categorical.py:2630": FutureWarning: The `inplace` parameter in pandas.Categorical.set_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  res = method(*args, **kwargs)
"categorical.py:2630": FutureWarning: The `inplace` parameter in pandas.Categorical.set_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  res = method(*args, **kwargs)

我多次收到此警告，并且代码继续运行超过 30 分钟。我已经删除了很多警告的路由路径，这就是它看起来不同的原因。此代码块会多次产生上述警告。

我正在关注 Scikit-Survival 文档并被困在这一点上。提供的一些附加代码可能有助于解决该错误，但我不确定是什么影响了该错误。

data_x 是具有以下数据类型的 Pandas 数据框

data_x.dtypes.astype(str)


f1   category
f2   category
f3   category
f4   float64
f5   category
f6   category
f7   category
f8   category
f9   category
f10  category
f11  category
f12  category
f13  int64
f14  category
f15  category
f16  category
f17  category
f18  category
f19  category
f20  category
f21  int64
dtype: object

data_y 是一个 numpy 数组

data_y

array([( True, 481.), ( True, 424.), ( True, 519.), ..., ( True,  13.),
       ( True,  96.), ( True,   6.)],
      dtype=[('event', '?'), ('duration', '<f8')])

data_x_numeric 是为预测而单独编码的新数据帧。

data_x_numeric = OneHotEncoder().fit_transform(data_x)
data_x_numeric.head()

我还获得了每个特征的单独 c-index 分数。

def fit_and_score_features(X, y):
    n_features = X.shape[1]
    scores = np.empty(n_features)
    m = CoxPHSurvivalAnalysis()
    for j in range(n_features):
        Xj = X[:, j:j+1]
        m.fit(Xj, y)
        scores[j] = m.score(Xj, y)
    return scores

scores = fit_and_score_features(data_x_numeric.values, data_y)
pd.Series(scores, index=data_x_numeric.columns).sort_values(ascending=False)

f1   0.631355
f2   0.564762
f3   0.564288
f4   0.554376
f5   0.549956
...   
f94  0.498701
f95  0.498413
f96  0.483840
f97  0.460941
f98  0.460898

然后我创建了一个管道。

#Creates pipline
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline

pipe = Pipeline([('encode', OneHotEncoder()),
                 ('select', SelectKBest(fit_and_score_features, k=3)),
                 ('model', CoxPHSurvivalAnalysis())])

这是我从文章开头应用代码的地方，以便选择我最好的功能以最大化整体 c-index 分数。我不太确定发生了什么，如果能提供任何帮助，我将不胜感激。

【问题讨论】：

尝试只安装管道，或更改网格搜索的error_score="raise"，这样您就可以看到实际出了什么问题。编辑问题的完整回溯。感谢您的评论@BenReiniger。我包括了整个警告消息并减去了所有文件路径。我是一个初学者，所以你可能有任何细节表示赞赏。 “矩阵是奇异的”似乎是潜在的问题，但我不熟悉CoxPHSurvivalAnalysis 来立即诊断。你的数据是什么形状的？ data_y.shape (17339,) / data_x.shape (17339, 21) 【参考方案1】：

检查丢失的数据。我有同样的错误。删除包含空单元格的行后，程序运行良好。

【讨论】：

以上是关于如何解决：“FitFailedWarning：估计器拟合失败。这些参数的训练测试分区上的分数将设置为 nan？”的主要内容，如果未能解决你的问题，请参考以下文章