将 Python 字典列表转换为 SciPy 稀疏矩阵

Posted 2023-03-12

技术标签:

【中文标题】将 Python 字典列表转换为 SciPy 稀疏矩阵【英文标题】：Converting a list of Python dictionaries into a SciPy sparse matrix 【发布时间】：2015-12-09 17:42:49 【问题描述】：

我想。

我知道我可以使用sklearn.feature_extraction.DictVectorizer.fit_transform()：

import sklearn.feature_extraction
feature_dictionary = ["feat1": 1.5, "feat10": 0.5, 
                      "feat4": 2.1, "feat5": 0.3, "feat7": 0.1, 
                      "feat2": 7.5]

v = sklearn.feature_extraction.DictVectorizer(sparse=True, dtype=float)
X = v.fit_transform(feature_dictionary)
print('X: \n0'.format(X))

哪个输出：

X: 
  (0, 0)    1.5
  (0, 1)    0.5
  (1, 3)    2.1
  (1, 4)    0.3
  (1, 5)    0.1
  (2, 2)    7.5

但是，我希望 feat1 在第 1 列中，feat10 在第 10 列中，feat4 在第 4 列中，依此类推。我怎样才能做到这一点？

【问题讨论】：

您可以使用 dict_vectorizer 并在生成矩阵后更改列的顺序。 【参考方案1】：

您可以手动设置sklearn.feature_extraction.DictVectorizer.vocabulary_ 和sklearn.feature_extraction.DictVectorizer.fit.feature_names_，而不是通过sklearn.feature_extraction.DictVectorizer.fit() 学习它们：

import sklearn.feature_extraction
feature_dictionary = ["feat1": 1.5, "feat10": 0.5, "feat4": 2.1, "feat5": 0.3, "feat7": 0.1, "feat2": 7.5]

v = sklearn.feature_extraction.DictVectorizer(sparse=True, dtype=float)
v.vocabulary_ = 'feat0': 0, 'feat1': 1, 'feat2': 2, 'feat3': 3, 'feat4': 4, 'feat5': 5, 
                 'feat6': 6,  'feat7': 7, 'feat8': 8, 'feat9': 9, 'feat10': 10
v.feature_names_ = ['feat0', 'feat1', 'feat2', 'feat3', 'feat4', 'feat5', 'feat6', 'feat7', 
                    'feat8', 'feat9', 'feat10']

X = v.transform(feature_dictionary)
print('v.vocabulary_ : 0 ; v.feature_names_: 1'.format(v.vocabulary_, v.feature_names_))
print('X: \n0'.format(X))

输出：

X: 
  (0, 1)    1.5
  (0, 10)   0.5
  (1, 4)    2.1
  (1, 5)    0.3
  (1, 7)    0.1
  (2, 2)    7.5

显然您不必手动定义vocabulary_ 和feature_names_：

v.vocabulary_ = 
v.feature_names_ = []
number_of_features = 11
for feature_number in range(number_of_features):
    feature_name = 'feat0'.format(feature_number) 
    v.vocabulary_[feature_name] = feature_number
    v.feature_names_.append(feature_name)                                      

print('v.vocabulary_ : 0 ; v.feature_names_: 1'.format(v.vocabulary_, v.feature_names_))

输出：

v.vocabulary_ : 'feat10': 10, 'feat9': 9, 'feat8': 8, 'feat5': 5, 'feat4': 4, 'feat7': 7, 
                 'feat6': 6, 'feat1': 1, 'feat0': 0, 'feat3': 3, 'feat2': 2
v.feature_names_: ['feat0', 'feat1', 'feat2', 'feat3', 'feat4', 'feat5', 'feat6', 'feat7', 
                   'feat8', 'feat9', 'feat10']

【讨论】：

以上是关于将 Python 字典列表转换为 SciPy 稀疏矩阵的主要内容，如果未能解决你的问题，请参考以下文章