xgboost ML 模型的 get_fscore() 有啥作用？ [复制]

Posted 2023-03-12

技术标签:

【中文标题】xgboost ML 模型的 get_fscore() 有啥作用？ [复制]【英文标题】：What does get_fscore() of an xgboost ML model do? [duplicate]xgboost ML 模型的 get_fscore() 有什么作用？ [复制] 【发布时间】：2016-02-12 15:25:50 【问题描述】：

有人知道这些数字是如何计算的吗？在文档中它说这个函数“获取每个特征的特征重要性”，但没有解释如何解释结果。

【问题讨论】：

不太清楚，但代码和方法本身在github上：github.com/dmlc/xgboost/blob/master/python-package/xgboost/… 谢谢。如果您仔细阅读代码，您会发现它是对该功能在决策树中出现频率的计数。 【参考方案1】：

这是一个指标，它简单地总结了每个特征被分割的次数。它类似于 R 版本中的频率度量。https://cran.r-project.org/web/packages/xgboost/xgboost.pdf

这是您可以获得的基本特征重要性指标。

即这个变量分裂了多少次？

此方法的代码显示它只是在所有树中添加给定特征的存在。

[这里..https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L953][1]

def get_fscore(self, fmap=''):
    """Get feature importance of each feature.
    Parameters
    ----------
    fmap: str (optional)
       The name of feature map file
    """
    trees = self.get_dump(fmap)  ## dump all the trees to text
    fmap =                     
    for tree in trees:              ## loop through the trees
        for line in tree.split('\n'):     # text processing
            arr = line.split('[')
            if len(arr) == 1:             # text processing 
                continue
            fid = arr[1].split(']')[0]    # text processing
            fid = fid.split('<')[0]       # split on the greater/less(find variable name)

            if fid not in fmap:  # if the feature id hasn't been seen yet
                fmap[fid] = 1    # add it
            else:
                fmap[fid] += 1   # else increment it
    return fmap                  # return the fmap, which has the counts of each time a  variable was split on

【讨论】：

以上是关于xgboost ML 模型的 get_fscore() 有啥作用？ [复制]的主要内容，如果未能解决你的问题，请参考以下文章