在 KNeighborsClassifier 中使用自定义指标时,我不断收到“TypeError:只有整数标量数组可以转换为标量索引”

Posted

技术标签:

【中文标题】在 KNeighborsClassifier 中使用自定义指标时,我不断收到“TypeError:只有整数标量数组可以转换为标量索引”【英文标题】:I keep getting "TypeError: only integer scalar arrays can be converted to a scalar index" while using custom-defined metric in KNeighborsClassifier 【发布时间】:2021-06-23 21:12:36 【问题描述】:

我在 SKlearn 的 KNeighborsClassifier 中使用自定义指标。这是我的代码:

def chi_squared(x,y):
return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))

chi squared distance function.上面的函数实现我用过NumPy函数,因为根据scikit-learn docs,metric函数需要两个一维numpy数组。

我已将 chi_squared 函数作为参数传递给 KNeighborsClassifier()。

knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)

但是,我不断收到以下错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-29-d2a365ebb538> in <module>
      4 
      5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
----> 6 knn.fit(X_train, Y_train)
      7 predictions = knn.predict(X_test)
      8 print(accuracy_score(Y_test, predictions))

~/.local/lib/python3.8/site-packages/sklearn/neighbors/_classification.py in fit(self, X, y)
    177             The fitted k-nearest neighbors classifier.
    178         """
--> 179         return self._fit(X, y)
    180 
    181     def predict(self, X):

~/.local/lib/python3.8/site-packages/sklearn/neighbors/_base.py in _fit(self, X, y)
    497 
    498         if self._fit_method == 'ball_tree':
--> 499             self._tree = BallTree(X, self.leaf_size,
    500                                   metric=self.effective_metric_,
    501                                   **self.effective_metric_params_)

sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.__init__()

sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree._recursive_build()

sklearn/neighbors/_ball_tree.pyx in sklearn.neighbors._ball_tree.init_node()

sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.rdist()

sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.DistanceMetric.rdist()

sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance.dist()

sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance._dist()

<ipython-input-29-d2a365ebb538> in chi_squared(x, y)
      1 def chi_squared(x,y):
----> 2     return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))
      3 
      4 
      5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)

<__array_function__ internals> in sum(*args, **kwargs)

~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial, where)
   2239         return res
   2240 
-> 2241     return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
   2242                           initial=initial, where=where)
   2243 

~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     85                 return reduction(axis=axis, out=out, **passkwargs)
     86 
---> 87     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     88 
     89 

TypeError: only integer scalar arrays can be converted to a scalar index

   

【问题讨论】:

【参考方案1】:

我可以通过以下方式重现您的错误消息:

In [173]: x=np.arange(3); y=np.array([2,3,4])
In [174]: np.sum(x,y)
Traceback (most recent call last):
  File "<ipython-input-174-1a1a267ebd82>", line 1, in <module>
    np.sum(x,y)
  File "<__array_function__ internals>", line 5, in sum
  File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2247, in sum
    return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
  File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
TypeError: only integer scalar arrays can be converted to a scalar index

np.sum的正确用法:

In [175]: np.sum(x)
Out[175]: 3
In [177]: np.sum(np.arange(6).reshape(2,3), axis=0)
Out[177]: array([3, 5, 7])
In [178]: np.sum(np.arange(6).reshape(2,3), 0)
Out[178]: array([3, 5, 7])

如有必要,请(重新)阅读np.sum 文档!

使用np.add 代替np.sum

In [179]: np.add(x,y)
Out[179]: array([2, 4, 6])
In [180]: x+y
Out[180]: array([2, 4, 6])

以下应该是等价的:

np.divide(np.square(np.subtract(x,y)), np.add(x,y))

(x-y)**2/(x+y)

【讨论】:

x 和 y 在这里是一维数组。我可以使用 np.add(x,y) 来添加两个数组的各自值吗? 你试过了吗?你查过文档吗?看到我的编辑了吗? 是的,我确实阅读了 np.sum 文档,它似乎只接受一个 numpy 数组并计算其所有元素的总和。但是,我想添加 x 和 y 的各个元素(根据 chi_squared 距离公式的要求)。但我this error。我尝试将结果转换为浮点数,但错误不会消失 np.sum 接受一个数组并对元素求和。第二个参数是执行此操作的axis(或轴)。该错误是由y 不符合此轴参数引起的。 我无法加载该错误消息。再看看我的编辑。另外,请注意我的答案中的输入/输出行?我从交互式 numpy 会话 (ipython) 中复制了这些内容。当您编写和测试代码时,您应该打开一个类似的会话。如果您先在那里测试代码片段,则较大的条更可能会起作用。

以上是关于在 KNeighborsClassifier 中使用自定义指标时,我不断收到“TypeError:只有整数标量数组可以转换为标量索引”的主要内容,如果未能解决你的问题,请参考以下文章

KNeighborsClassifier' 对象没有属性 'append'

如何将 metric='correlation' 与 KNeighborsClassifier 一起使用

sklearn - KNeighborsClassifier - ValueError:未知标签类型:'连续'

为啥 cross_val_predict 比适合 KNeighborsClassifier 慢得多?

如何根据最高精度在sklearn的KNeighborsClassifier中选择K

KNeighborsClassifier()