给定距离矩阵的 Python 中的最近邻

Posted 2023-03-12

技术标签:

【中文标题】给定距离矩阵的 Python 中的最近邻【英文标题】：Nearest Neighbors in Python given the distance matrix 【发布时间】：2014-03-07 16:30:15 【问题描述】：

我必须在 Python 中应用最近邻，我正在寻找 scikit-learn 和 scipy 库，它们都需要数据作为输入，然后将计算距离并应用算法。

就我而言，我必须计算非常规距离，因此我想知道是否有办法直接输入距离矩阵。

【问题讨论】：

【参考方案1】：

正如福特所说并根据文档 http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier 您应该将自定义距离转换为 DistanceMetric 对象并将其作为度量参数传递。

【讨论】：

我认为这不是真的。文档说：

[callable]: a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

【参考方案2】：

您需要创建一个DistanceMetric 对象，并提供您自己的函数作为参数：

metric = sklearn.neighbors.DistanceMetric.get_metric('pyfunc', func=func)

来自文档：

这里func是一个函数，它接受两个一维numpy数组，并返回一个距离。请注意，为了在 BallTree 中使用，距离必须是一个真实的度量：即它必须满足以下属性
非负性：d(x, y) >= 0 身份：d(x, y) = 0 当且仅当 x == y 对称性：d(x, y) = d(y, x) 三角不等式：d(x, y) + d(y, z) >= d(x, z)

然后您可以使用metric=metric 作为关键字参数创建分类器，它会在计算距离时使用它。

【讨论】：

【参考方案3】：

想要添加到福特的回答中，你必须这样做

metric = DistanceMetric.get_metric('pyfunc',func=/你的函数名/)

你不能只把你自己的函数作为第二个参数，你必须将参数命名为“func”

【讨论】：

好收获！我编辑了我的答案以在参数之前包含关键字。【参考方案4】：

如果您设置了metric="precomputed"，您可以将自己的距离矩阵传递给sklearn.neighbors.NearestNeighbors。如下例所示，当使用欧式距离度量时，结果确实相当于直接传递特征。

import numpy as np
from numpy.testing import assert_array_equal
from scipy.spatial.distance import cdist
from sklearn.neighbors import NearestNeighbors

# Generate random vectors to use as data for k-nearest neighbors.
rng = np.random.default_rng(0)
X = rng.random((10, 2))

# Fit NearestNeighbors on vectors and retrieve neighbors.
knn_vector_based = NearestNeighbors(n_neighbors=2).fit(X)
nn_1 = knn_vector_based.kneighbors(return_distance=False)

# Calculate distance matrix.
# This computation can be replaced with any custom distance metric you have.
distance_matrix = cdist(X, X)

# Fit NearestNeighbors on distance matrix and retrieve neighbors.
knn_distance_based = (
    NearestNeighbors(n_neighbors=2, metric="precomputed")
        .fit(distance_matrix)
)

nn_2 = knn_distance_based.kneighbors(return_distance=False)

# Verify that that the result is the same.
assert_array_equal(nn_1, nn_2)

# Neighbors for single points can be retrieved by passing 
# a subset of the original distance matrix.
nn_of_first_point_1 = knn_vector_based.kneighbors(
    X[0, None], return_distance=False
)
nn_of_first_point_2 = knn_distance_based.kneighbors(
    distance_matrix[0, None], return_distance=False
)

assert_array_equal(nn_of_first_point_1, nn_of_first_point_2)

【讨论】：

以上是关于给定距离矩阵的 Python 中的最近邻的主要内容，如果未能解决你的问题，请参考以下文章