xgboost 特征重要性计算

Posted 2021-01-19 cupleo

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了xgboost 特征重要性计算相关的知识，希望对你有一定的参考价值。

在XGBoost中提供了三种特征重要性的计算方法：

‘weight’ - the number of times a feature is used to split the data across all trees.
‘gain’ - the average gain of the feature when it is used in trees
‘cover’ - the average coverage of the feature when it is used in trees

简单来说
weight就是在所有树中特征用来分割的节点个数总和；
gain就是特征用于分割的平均增益
cover 的解释有点晦涩，在[R-package/man/xgb.plot.tree.Rd]有比较详尽的解释：(https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd)：the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be。实际上coverage可以理解为被分到该节点的样本的二阶导数之和，而特征度量的标准就是平均的coverage值。

还是举李航书上那个例子，我们用不同颜色来表示不同的特征，绘制下图
技术分享图片

以上是关于xgboost 特征重要性计算的主要内容，如果未能解决你的问题，请参考以下文章