TypeError：只有一个元素的整数数组可以转换为索引

Posted 2023-03-12

技术标签:

【中文标题】TypeError：只有一个元素的整数数组可以转换为索引【英文标题】：TypeError: only integer arrays with one element can be converted to an index 【发布时间】：2012-09-08 18:53:39 【问题描述】：

使用交叉验证执行递归特征选择时出现以下错误：

Traceback (most recent call last):
  File "/Users/.../srl/main.py", line 32, in <module>
    argident_sys.train_classifier()
  File "/Users/.../srl/identification.py", line 194, in train_classifier
    feat_selector.fit(train_argcands_feats,train_argcands_target)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/feature_selection/rfe.py", line 298, in fit
    ranking_ = rfe.fit(X[train], y[train]).ranking_
TypeError: only integer arrays with one element can be converted to an index

产生错误的代码如下：

def train_classifier(self):

    # Get the argument candidates
    argcands = self.get_argcands(self.reader)

    # Extract the necessary features from the argument candidates
    train_argcands_feats = []
    train_argcands_target = []

    for argcand in argcands:
        train_argcands_feats.append(self.extract_features(argcand))
        if argcand["info"]["label"] == "NULL":
            train_argcands_target.append("NULL")
        else:
            train_argcands_target.append("ARG")

    # Transform the features to the format required by the classifier
    self.feat_vectorizer = DictVectorizer()
    train_argcands_feats = self.feat_vectorizer.fit_transform(train_argcands_feats)

    # Transform the target labels to the format required by the classifier
    self.target_names = list(set(train_argcands_target))
    train_argcands_target = [self.target_names.index(target) for target in train_argcands_target]

    ## Train the appropriate supervised model      

    # Recursive Feature Elimination
    self.classifier = LogisticRegression()
    feat_selector = RFECV(estimator=self.classifier, step=1, cv=StratifiedKFold(train_argcands_target, 10))

    feat_selector.fit(train_argcands_feats,train_argcands_target)

    print feat_selector.n_features_
    print feat_selector.support_
    print feat_selector.ranking_
    print feat_selector.cv_scores_

    return

我知道我还应该对 LogisticRegression 分类器的参数执行 GridSearch，但我认为这不是错误的根源（或者是吗？）。

我应该提到我正在测试大约 50 个功能，并且几乎所有这些功能都是分类的（这就是我使用 DictVectorizer 对它们进行适当转换的原因）。

我们非常欢迎您提供任何帮助或指导。谢谢！

编辑

以下是一些训练数据示例：

train_argcands_feats = ['head_lemma': u'Bras\xedlia', 'head': u'Bras\xedlia', 'head_postag': u'PROP', 'head_lemma': u'Pesquisa_Datafolha', 'head': u'Pesquisa_Datafolha', 'head_postag': u'N', 'head_lemma': u'dado', 'head': u'dado', 'head_postag': u'N', 'head_lemma': u'postura', 'head': u'postura', 'head_postag': u'N', 'head_lemma': u'maioria', 'head': u'maioria', 'head_postag': u'N', 'head_lemma': u'querer', 'head': u'quer', 'head_postag': u'V-FIN', 'head_lemma': u'PT', 'head': u'PT', 'head_postag': u'PROP', 'head_lemma': u'participar', 'head': u'participando', 'head_postag': u'V-GER', 'head_lemma': u'surpreendente', 'head': u'supreendente', 'head_postag': u'ADJ', 'head_lemma': u'Bras\xedlia', 'head': u'Bras\xedlia', 'head_postag': u'PROP', 'head_lemma': u'Pesquisa_Datafolha', 'head': u'Pesquisa_Datafolha', 'head_postag': u'N', 'head_lemma': u'revelar', 'head': u'revela', 'head_postag': u'V-FIN', 'head_lemma': u'recusar', 'head': u'recusando', 'head_postag': u'V-GER', 'head_lemma': u'maioria', 'head': u'maioria', 'head_postag': u'N', 'head_lemma': u'PT', 'head': u'PT', 'head_postag': u'PROP', 'head_lemma': u'participar', 'head': u'participando', 'head_postag': u'V-GER', 'head_lemma': u'surpreendente', 'head': u'supreendente', 'head_postag': u'ADJ', 'head_lemma': u'Bras\xedlia', 'head': u'Bras\xedlia', 'head_postag': u'PROP', 'head_lemma': u'Pesquisa_Datafolha', 'head': u'Pesquisa_Datafolha', 'head_postag': u'N', 'head_lemma': u'revelar', 'head': u'revela', 'head_postag': u'V-FIN', 'head_lemma': u'governo', 'head': u'Governo', 'head_postag': u'N', 'head_lemma': u'de', 'head': u'de', 'head_postag': u'PRP', 'head_lemma': u'governo', 'head': u'Governo', 'head_postag': u'N', 'head_lemma': u'recusar', 'head': u'recusando', 'head_postag': u'V-GER', 'head_lemma': u'maioria', 'head': u'maioria', 'head_postag': u'N', 'head_lemma': u'querer', 'head': u'quer', 'head_postag': u'V-FIN', 'head_lemma': u'PT', 'head': u'PT', 'head_postag': u'PROP', 'head_lemma': u'surpreendente', 'head': u'supreendente', 'head_postag': u'ADJ', 'head_lemma': u'Bras\xedlia', 'head': u'Bras\xedlia', 'head_postag': u'PROP', 'head_lemma': u'Pesquisa_Datafolha', 'head': u'Pesquisa_Datafolha', 'head_postag': u'N', 'head_lemma': u'revelar', 'head': u'revela', 'head_postag': u'V-FIN', 'head_lemma': u'muito', 'head': u'Muitas', 'head_postag': u'PRON-DET', 'head_lemma': u'prioridade', 'head': u'prioridades', 'head_postag': u'N', 'head_lemma': u'com', 'head': u'com', 'head_postag': u'PRP', 'head_lemma': u'prioridade', 'head': u'prioridades', 'head_postag': u'N']

train_argcands_target = ['NULL', 'ARG', 'ARG', 'ARG', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'ARG', 'ARG', 'ARG', 'ARG', 'NULL', 'NULL', 'NULL', 'NULL', 'ARG', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'ARG', 'NULL', 'NULL', 'NULL', 'NULL', 'ARG', 'ARG', 'NULL', 'NULL']

【问题讨论】：

根据堆栈跟踪，问题出在您的 feat_selector.fit(train_argcands_feats,train_argcands_target) 调用中。 RFECV 是您创建的类还是库？是否可以发布您的RFECV.fit() 代码？ @acattle 这是一个 scikit-learn 库：scikit-learn.org/stable/modules/generated/… @acattle 你在哪里看到的？ @möter 我很抱歉。我误读了代码。评论已删除。看来anking_ = rfe.fit(X[train], y[train]).ranking_行中train的值是问题所在，但是如果不查看所有rfe.py源代码就无法确定它的值是如何确定的。 【参考方案1】：

我终于解决了这个问题。必须做两件事：

train_argcands_target

感谢所有试图提供帮助的人！

【讨论】：

【参考方案2】：

如果还有人感兴趣，

我在非常相似的东西上使用了CountVectorizer，它给了我同样的错误。我意识到矢量化器给了我一个 COO 稀疏矩阵，它基本上是一个坐标列表。 COO 矩阵中的元素不能通过行索引访问。最好将其转换为按行索引的 CSR 矩阵（压缩稀疏行）。转换可以轻松完成coo_matrix.tocsr()。不需要其他更改，这对我有用。

【讨论】：

我遇到了同样的问题，这个解决方案对我有用。这比在此处检索已接受的答案更容易。此外，提供的原因（COO 无法通过行索引访问）更有意义。完全同意。首席运营官是问题所在。

以上是关于TypeError：只有一个元素的整数数组可以转换为索引的主要内容，如果未能解决你的问题，请参考以下文章