可以使用 S2 库找到 K 个最近点（并且有效）？

Posted 2023-02-23

技术标签:

【中文标题】可以使用 S2 库找到 K 个最近点（并且有效）？【英文标题】：Possible to find K closest points using S2 library (and is it effective)? 【发布时间】：2019-06-30 08:16:46 【问题描述】：

背景

大家好，我最近发现了 Google 的开源 S2 库

https://github.com/google/s2geometry

我目前正在编写一个应用程序，该应用程序需要在给定原始目标点的情况下找到 K 个最近点。目前，我在包含纬度/经度值的列上使用带有地理空间索引的 PostgreSQL 来实现这一点 - 但是，当 S2 引起我的注意时，我正在寻找替代方案。

问题

我对图书馆很陌生，对此我有一些疑问。

1）是否可以使用S2库找到K个最近点

2) S2 中的查询与地理空间索引相比有多快（上/下/相同/等）

【问题讨论】：

【参考方案1】：

Google 的 S2 库是一种地理散列形式。它可用于显着优化您的地理查找，因为它只是一个哈希/ID 查找。

一种索引方法可能是：

在相当大的 S2 单元级别上为您关心的所有点编制索引。你应该评估你的分数，看看哪个级别适合你based on this chart。

在检索时，将您的搜索点转换为该级别的 S2 单元格，然后基于此拉取所有候选点。

（可选，取决于您关心的精度）计算候选点和搜索点之间的距离并排序

这种性能提升需要权衡取舍：

在您的点上为 S2 单元编制索引意味着更多的存储空间（每个 id 64 位整数）

您可能会错过查询所依据的 S2 单元之外的点。您可以在 S2 的多个级别上建立索引，以确保您检索到足够的点。根据您点的密度，这可能不是问题。

通过 S2 单元 ID 检索实际上不会为您提供点之间的距离 - 您必须自己计算

这是来自Node S2 library的代码示例：

const s2 = require('@radarlabs/s2');

const user1LongLat = [-73.95772933959961, 40.71623280185081];
const user2LongLat = [-73.95927429199219, 40.71629785715124];
const user3LongLat = [-73.99206161499023, 40.688708709249646];

const user1S2 = ["user1", new s2.CellId(new s2.LatLng(user1LongLat[1], user1LongLat[0])).parent(13)];
const user2S2 = ["user2", new s2.CellId(new s2.LatLng(user2LongLat[1], user2LongLat[0])).parent(13)];
const user3S2 = ["user3", new s2.CellId(new s2.LatLng(user3LongLat[1], user3LongLat[0])).parent(13)];

const groups = ;
[user1S2, user2S2, user3S2].forEach(([userId, cellId]) => 
  const group = groups[cellId.token()] || [];
  group.push(userId);
  groups[cellId.token()] = group;
);

const searchPointLongLat = [-73.98991584777832, 40.69528168934989];
const searchPointS2 = new s2.CellId(new s2.LatLng(searchPointLongLat[1], searchPointLongLat[0])).parent(13);

console.log(searchPointS2.token()); // '89c25a4c'
console.log(groups); //  '89c2595c': [ 'user1', 'user2' ], '89c25a4c': [ 'user3' ] 

const closePoints = groups[searchPointS2.token()];
console.log(closePoints); // [ 'user3' ]

这是map visualization of the S2 tokens that were created。

长话短说，是的，它是一种散列形式，因此您可以通过权衡存储获得更快的性能，但您可能需要根据您的要求调整准确性的某些方面。

【讨论】：

以上是关于可以使用 S2 库找到 K 个最近点（并且有效）？的主要内容，如果未能解决你的问题，请参考以下文章