GridSearchCV 估计器 LogisticRegression 的参数 gamma 无效

Posted

技术标签:

【中文标题】GridSearchCV 估计器 LogisticRegression 的参数 gamma 无效【英文标题】:GridSearchCV with Invalid parameter gamma for estimator LogisticRegression 【发布时间】:2020-02-22 01:22:30 【问题描述】:

我需要对下面列出的 Logistic 回归分类器的参数执行网格搜索,使用召回进行评分和交叉验证 3 次。

数据保存在 csv 文件 (11,1 MB) 中,下载链接为:https://drive.google.com/file/d/1cQFp7HteaaL37CefsbMNuHqPzkINCVzs/view?usp=sharing

我有grid_values = 'gamma':[0.01, 0.1, 1, 10, 100] 我需要在逻辑回归中应用惩罚 L1 e L2

我无法验证分数是否会运行,因为我有以下错误: 估计器 LogisticRegression 的参数 gamma 无效。使用estimator.get_params().keys()查看可用参数列表。

这是我的代码:

from sklearn.model_selection import train_test_split

df = pd.read_csv('fraud_data.csv')

X = df.iloc[:,:-1]
y = df.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)




def LogisticR_penalty():    
    from sklearn.model_selection import GridSearchCV
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import cross_val_score

    grid_values = 'gamma':[0.01, 0.1, 1, 10, 100]


    #train de model with many parameters for "C" and penalty='l1'
    lr_l1 = LogisticRegression(penalty='l1')
    grid_lr_l1 = GridSearchCV(lr_l1, param_grid = grid_values, cv=3, scoring = 'recall')
    grid_lr_l1.fit(X_train, y_train)
    y_decision_fn_scores_recall = grid_lr_l1.decision_function(X_test)


    lr_l2 = LogisticRegression(penalty='l2')
    grid_lr_l2 = GridSearchCV(lr_l2, param_grid = grid_values, cv=3 , scoring = 'recall')
    grid_lr_l2.fit(X_train, y_train)
    y_decision_fn_scores_recall = grid_lr_l2.decision_function(X_test)



    #The precision, recall, and accuracy scores for every combination 
    #of the parameters in param_grid are stored in cv_results_
    results = pd.DataFrame()

    results['l1_results'] = pd.DataFrame(grid_lr_l1.cv_results_)
    results['l1_results'] = results['l2_results'].sort_values(by='mean_test_precision_score', ascending=False)

    results['l2_results'] = pd.DataFrame(grid_lr_l2.cv_results_)
    results['l2_results'] = results['l2_results'].sort_values(by='mean_test_precision_score', ascending=False)


    return results
LogisticR_penalty()

我期望从 .cv_results_ 中获得我应该在此处获得的每个参数组合的平均测试分数:mean_test_precision_score 但不确定

输出是:ValueError: Invalid parameter gamma for estimator LogisticRegression。使用estimator.get_params().keys()查看可用参数列表。

【问题讨论】:

【参考方案1】:

从scikit-learn's documentation 开始,LogisticRegression 没有参数gamma,但有一个参数C 用于正则化权重。

如果您将grid_values = 'gamma':[0.01, 0.1, 1, 10, 100] 更改为grid_values = 'C':[0.01, 0.1, 1, 10, 100],您的代码应该可以工作。

【讨论】:

除了从 gamma 更改为 C 之外,我还必须包括我需要工作的惩罚:grid_values ​​= 'penalty': ['l1', 'l2'], 'C ': [0.01, 0.1, 1, 10, 100]【参考方案2】:

错误消息包含您问题的答案。您可以使用函数estimator.get_params().keys() 来查看估算器的所有可用参数:

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()

print(lr.get_params().keys())

输出:

dict_keys(['C', 'class_weight', 'dual', 'fit_intercept', 'intercept_scaling', 'l1_ratio', 'max_iter', 'multi_class', 'n_jobs', 'penalty', 'random_state', 'solver', 'tol', 'verbose', 'warm_start'])

【讨论】:

谢谢,它帮助我澄清了一些事情。我真的不知道您可以知道估算器可用的所有参数。【参考方案3】:

我的代码包含一些错误,主要错误是错误地使用了 param_grid。我必须对 gamma 0.01、0.1、1、10、100 应用 L1 和 L2 惩罚。正确的做法是:

grid_values ​​= 'penalty': ['l1', 'l2'], 'C': [0.01, 0.1, 1, 10, 100]

然后有必要纠正我训练逻辑回归的方式,并纠正我在 cv_results_ 中检索分数并平均这些分数的方式。 按照我的代码:

from sklearn.model_selection import train_test_split

df = pd.read_csv('fraud_data.csv')

X = df.iloc[:,:-1]
y = df.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

def LogisticR_penalty():    
    from sklearn.model_selection import GridSearchCV
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import cross_val_score

    grid_values = 'penalty': ['l1', 'l2'], 'C': [0.01, 0.1, 1, 10, 100]


    #train de model with many parameters for "C" and penalty='l1'

    lr = LogisticRegression()
    # We use GridSearchCV to find the value of the range that optimizes a given measurement metric.
    grid_lr_recall = GridSearchCV(lr, param_grid = grid_values, cv=3, scoring = 'recall')
    grid_lr_recall.fit(X_train, y_train)
    y_decision_fn_scores_recall = grid_lr_recall.decision_function(X_test)

    ##The precision, recall, and accuracy scores for every combination 
    #of the parameters in param_grid are stored in cv_results_
    CVresults = []
    CVresults = pd.DataFrame(grid_lr_recall.cv_results_)

    #test scores and mean of them
    split_test_scores = np.vstack((CVresults['split0_test_score'], CVresults['split1_test_score'], CVresults['split2_test_score']))
    mean_scores = split_test_scores.mean(axis=0).reshape(5, 2)

    return mean_scores
LogisticR_penalty()

【讨论】:

以上是关于GridSearchCV 估计器 LogisticRegression 的参数 gamma 无效的主要内容,如果未能解决你的问题,请参考以下文章

GridSearchCV(sklearn)中的多个估计器

GridSearchCV - FitFailedWarning:估计器拟合失败

如果有多个具有相同分数的估计器,GridsearchCV 会选择啥?

你能从 sklearn 网格搜索 (GridSearchCV) 中获得所有估计器吗?

GridSearchCV 和 ValueError:估计器管道的参数 alpha 无效

如何使用带有 SVC 估计器的 OneVsRestClassifier 的 GridSearchCV?