python 计数Vectorizer与Numpy数组

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 计数Vectorizer与Numpy数组相关的知识,希望对你有一定的参考价值。

from sklearn.feature_extraction.text import CountVectorizer

# Initialize the "CountVectorizer" object, which is scikit-learn's
# bag of words tool.  
vectorizer = CountVectorizer(analyzer = "word",   \
                             tokenizer = None,    \
                             preprocessor = None, \
                             stop_words = None,   \
                             max_features = 5000) 

# fit_transform() does two functions: First, it fits the model
# and learns the vocabulary; second, it transforms our training data
# into feature vectors. The input to fit_transform should be a list of 
# strings.
train_data_features = vectorizer.fit_transform(clean_train_reviews)

# Numpy arrays are easy to work with, so convert the result to an 
# array
train_data_features = train_data_features.toarray()

以上是关于python 计数Vectorizer与Numpy数组的主要内容,如果未能解决你的问题,请参考以下文章

python sklearn 不仅使用计数功能进行朴素贝叶斯学习

问题解决:Python中取消科学计数法,常见于Numpy|Pandas

Python机器学习(四十六)NumPy 排序查找计数

无法编写 Count Vectorizer 词汇表

python 输出数字,如何不以科学计数法输出?

Vectorizer Python中的单词组合