XGBoost 输出特征重要性以及筛选特征

Posted tan2810

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了XGBoost 输出特征重要性以及筛选特征相关的知识,希望对你有一定的参考价值。

1.输出XGBoost特征的重要性

from matplotlib import pyplot
pyplot.bar(range(len(model_XGB.feature_importances_)), model_XGB.feature_importances_)
pyplot.show()

from matplotlib import pyplot
pyplot.bar(range(len(model_XGB.feature_importances_)), model_XGB.feature_importances_)
pyplot.show()

也可以使用XGBoost内置的特征重要性绘图函数

# plot feature importance using built-in function
from xgboost import plot_importance
plot_importance(model_XGB)
pyplot.show()
# plot feature importance using built-in function
from xgboost import plot_importance
plot_importance(model_XGB)
pyplot.show()

 

2.根据特征重要性筛选特征

from numpy import sort
from sklearn.feature_selection import SelectFromModel

# Fit model using each importance as a threshold
thresholds = sort(model_XGB.feature_importances_)
for thresh in thresholds:
  # select features using threshold
  selection = SelectFromModel(model_XGB, threshold=thresh, prefit=True)
  select_X_train = selection.transform(X_train)
  # train model
  selection_model = XGBClassifier()
  selection_model.fit(select_X_train, y_train)
# eval model
  select_X_test = selection.transform(X_test)
  y_pred = selection_model.predict(select_X_test)
  predictions = [round(value) for value in y_pred]
  accuracy = accuracy_score(y_test, predictions)
  print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (thresh, select_X_train.shape[1],
      accuracy*100.0))
技术图片
from numpy import sort
from sklearn.feature_selection import SelectFromModel

# Fit model using each importance as a threshold
thresholds = sort(model_XGB.feature_importances_)
for thresh in thresholds:
  # select features using threshold
  selection = SelectFromModel(model_XGB, threshold=thresh, prefit=True)
  select_X_train = selection.transform(X_train)
  # train model
  selection_model = XGBClassifier()
  selection_model.fit(select_X_train, y_train)
# eval model
  select_X_test = selection.transform(X_test)
  y_pred = selection_model.predict(select_X_test)
  predictions = [round(value) for value in y_pred]
  accuracy = accuracy_score(y_test, predictions)
  print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (thresh, select_X_train.shape[1],
      accuracy*100.0))
技术图片

 参考:https://blog.csdn.net/u011630575/article/details/79423162

以上是关于XGBoost 输出特征重要性以及筛选特征的主要内容,如果未能解决你的问题,请参考以下文章

XGBoost三种特征重要性计算方法对比

使用 XGBoost 特征重要性分数打印出特征选择中使用的特征

XGBoost 中的特征重要性“增益”

特征筛选还在用XGB的Feature Importance?试试Permutation Importance

特征筛选还在用XGB的Feature Importance?试试Permutation Importance

xgboost 特征重要性计算