在 KNeighborsClassifier 中使用自定义指标时,我不断收到“TypeError:只有整数标量数组可以转换为标量索引”
Posted
技术标签:
【中文标题】在 KNeighborsClassifier 中使用自定义指标时,我不断收到“TypeError:只有整数标量数组可以转换为标量索引”【英文标题】:I keep getting "TypeError: only integer scalar arrays can be converted to a scalar index" while using custom-defined metric in KNeighborsClassifier 【发布时间】:2021-06-23 21:12:36 【问题描述】:我在 SKlearn 的 KNeighborsClassifier 中使用自定义指标。这是我的代码:
def chi_squared(x,y):
return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))
chi squared distance function.上面的函数实现我用过NumPy函数,因为根据scikit-learn docs,metric函数需要两个一维numpy数组。
我已将 chi_squared 函数作为参数传递给 KNeighborsClassifier()。
knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
但是,我不断收到以下错误:
TypeError Traceback (most recent call last)
<ipython-input-29-d2a365ebb538> in <module>
4
5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
----> 6 knn.fit(X_train, Y_train)
7 predictions = knn.predict(X_test)
8 print(accuracy_score(Y_test, predictions))
~/.local/lib/python3.8/site-packages/sklearn/neighbors/_classification.py in fit(self, X, y)
177 The fitted k-nearest neighbors classifier.
178 """
--> 179 return self._fit(X, y)
180
181 def predict(self, X):
~/.local/lib/python3.8/site-packages/sklearn/neighbors/_base.py in _fit(self, X, y)
497
498 if self._fit_method == 'ball_tree':
--> 499 self._tree = BallTree(X, self.leaf_size,
500 metric=self.effective_metric_,
501 **self.effective_metric_params_)
sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.__init__()
sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree._recursive_build()
sklearn/neighbors/_ball_tree.pyx in sklearn.neighbors._ball_tree.init_node()
sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.rdist()
sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.DistanceMetric.rdist()
sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance.dist()
sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance._dist()
<ipython-input-29-d2a365ebb538> in chi_squared(x, y)
1 def chi_squared(x,y):
----> 2 return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))
3
4
5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
<__array_function__ internals> in sum(*args, **kwargs)
~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial, where)
2239 return res
2240
-> 2241 return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
2242 initial=initial, where=where)
2243
~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
85 return reduction(axis=axis, out=out, **passkwargs)
86
---> 87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
89
TypeError: only integer scalar arrays can be converted to a scalar index
【问题讨论】:
【参考方案1】:我可以通过以下方式重现您的错误消息:
In [173]: x=np.arange(3); y=np.array([2,3,4])
In [174]: np.sum(x,y)
Traceback (most recent call last):
File "<ipython-input-174-1a1a267ebd82>", line 1, in <module>
np.sum(x,y)
File "<__array_function__ internals>", line 5, in sum
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2247, in sum
return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
TypeError: only integer scalar arrays can be converted to a scalar index
np.sum
的正确用法:
In [175]: np.sum(x)
Out[175]: 3
In [177]: np.sum(np.arange(6).reshape(2,3), axis=0)
Out[177]: array([3, 5, 7])
In [178]: np.sum(np.arange(6).reshape(2,3), 0)
Out[178]: array([3, 5, 7])
如有必要,请(重新)阅读np.sum
文档!
使用np.add
代替np.sum
:
In [179]: np.add(x,y)
Out[179]: array([2, 4, 6])
In [180]: x+y
Out[180]: array([2, 4, 6])
以下应该是等价的:
np.divide(np.square(np.subtract(x,y)), np.add(x,y))
(x-y)**2/(x+y)
【讨论】:
x 和 y 在这里是一维数组。我可以使用 np.add(x,y) 来添加两个数组的各自值吗? 你试过了吗?你查过文档吗?看到我的编辑了吗? 是的,我确实阅读了 np.sum 文档,它似乎只接受一个 numpy 数组并计算其所有元素的总和。但是,我想添加 x 和 y 的各个元素(根据 chi_squared 距离公式的要求)。但我this error。我尝试将结果转换为浮点数,但错误不会消失np.sum
接受一个数组并对元素求和。第二个参数是执行此操作的axis
(或轴)。该错误是由y
不符合此轴参数引起的。
我无法加载该错误消息。再看看我的编辑。另外,请注意我的答案中的输入/输出行?我从交互式 numpy
会话 (ipython
) 中复制了这些内容。当您编写和测试代码时,您应该打开一个类似的会话。如果您先在那里测试代码片段,则较大的条更可能会起作用。以上是关于在 KNeighborsClassifier 中使用自定义指标时,我不断收到“TypeError:只有整数标量数组可以转换为标量索引”的主要内容,如果未能解决你的问题,请参考以下文章
KNeighborsClassifier' 对象没有属性 'append'
如何将 metric='correlation' 与 KNeighborsClassifier 一起使用
sklearn - KNeighborsClassifier - ValueError:未知标签类型:'连续'
为啥 cross_val_predict 比适合 KNeighborsClassifier 慢得多?