get_feature_names() 不适用于在 scikit learn 中使用 CountVectorizer() 制作的稀疏矩阵

Posted 2023-03-12

技术标签:

【中文标题】get_feature_names() 不适用于在 scikit learn 中使用 CountVectorizer() 制作的稀疏矩阵【英文标题】：get_feature_names() is not working for a sparse matrix made using CountVectorizer() of sikit learn 【发布时间】：2021-11-03 16:29:51 【问题描述】：

我正在处理亚马逊美食评论数据集，经过所有预处理后，我试图对我的数据（在熊猫数据框中）使用 CountVectorizer() 函数，我想知道稀疏矩阵，但是当我使用 get_feature_names() 函数时，它会给出一个错误提示 "AttributeError: get_feature_names not found"

这是代码

from sklearn.feature_extraction.text import CountVectorizer
count_vec = CountVectorizer(ngram_range=(1,2))
bigram_count = count_vec.fit_transform(data["CleanedText"].values)
print(bigram_count.get_feature_names())

代码中的 data["CleanedText"] 是一个 panda 数据框列，其中包含特定评论的所有预处理词这是我遇到的错误

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-db82ffbaf8ba> in <module>
----> 1 print(bigram_count.get_feature_names())

~\Anaconda3\lib\site-packages\scipy\sparse\base.py in __getattr__(self, attr)
    687             return self.getnnz()
    688         else:
--> 689             raise AttributeError(attr + " not found")
    690 
    691     def transpose(self, axes=None, copy=False):

AttributeError: get_feature_names not found

【问题讨论】：

【参考方案1】：

您不能在稀疏矩阵上调用.get_feature_names()，因为它不是稀疏矩阵的属性。

这是CountVectorize 对象的一个属性。请致电count_vec.get_feature_names()。

【讨论】：

以上是关于get_feature_names() 不适用于在 scikit learn 中使用 CountVectorizer() 制作的稀疏矩阵的主要内容，如果未能解决你的问题，请参考以下文章