使用逻辑回归时如何打印特征重要性的简单列表？

Posted 2023-03-12

技术标签:

【中文标题】使用逻辑回归时如何打印特征重要性的简单列表？【英文标题】：How to print a simple list of feature importance in when using Logistic Regression? 【发布时间】：2021-12-22 14:12:36 【问题描述】：

我正在使用此处找到的数据集：https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset

我的代码是：

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

log_reg_model = LogisticRegression(max_iter=1000, solver = "newton-cg")
log_reg_model = RFE(log_reg_model, 45) # using RFE to get the top 45 most important features
log_reg_model.fit(X_train_SMOTE, y_train_SMOTE) # fitting data
y_pred = log_reg_model.predict(X_test)
print("Model accruracy score: ".format(accuracy_score(y_test, y_pred)))
print(classification_report(y_test, y_pred))

我正在尝试按顺序打印出最重要的特征，就像在随机森林分类中使用 feature_importances_ 函数时一样。

以上使用 LR 是否可行？我在 Stack Overflow 上看到了类似的问题，但没有显示功能名称及其重要性的答案。

【问题讨论】：

【参考方案1】：

为此，您可以使用称为shap、I definitely would recommend reading about SHAP before diving right into the code 的方法，因为这对于您和其他人准确了解您所呈现的内容非常重要。

但是，如何在您的实施中发挥作用的一个示例是：

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
import shap

log_reg_model = LogisticRegression(max_iter=1000, solver = "newton-cg")
# log_reg_model = RFE(log_reg_model, 45) # using RFE to get the top 45 most important features
log_reg_model.fit(X_train_SMOTE, y_train_SMOTE) # fitting data
y_pred = log_reg_model.predict(X_test)
print("Model accruracy score: ".format(accuracy_score(y_test, y_pred)))
print(classification_report(y_test, y_pred))

explainer = shap.LinearExplainer(log_reg_model, X_train_SMOTE)
shap_values = explainer.shap_values(X_test[:150])

shap.summary_plot(shap_values, feature_names = X_train_SMOTE.columns)

【讨论】：

以上是关于使用逻辑回归时如何打印特征重要性的简单列表？的主要内容，如果未能解决你的问题，请参考以下文章