Elasticsearch geo搜索奇怪的行为

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Elasticsearch geo搜索奇怪的行为相关的知识,希望对你有一定的参考价值。

几天前,我面对Elasticsearch中地理搜索的奇怪行为。

我使用AWS托管的ES 5.5,显然是通过REST接口。

假设我们有200k个对象,其中位置信息仅表示为点。我使用地理搜索来查找多个多边形内的点。它们显示在下图中。从ES的最终请求中提取坐标。 polygons该请求是使用官方Java高级REST客户端构建的。请求查询将在下面附上。

我想搜索至少一个多边形内的所有对象。这是查询(实际字段名称和值被存根替换,除了location和locationPoint.coordinates)

{
  "size" : 20,
  "query" : {
    "constant_score" : {
      "filter" : {
        "bool" : {
          "must" : [
            {
              "terms" : {
                "field1" : [
                  "a",
                  "b",
                  "c",
                  "d",
                  "e",
                  "f"
                ],
                "boost" : 1.0
              }
            },
            {
              "term" : {
                "field2" : {
                  "value" : "q",
                  "boost" : 1.0
                }
              }
            },
            {
              "range" : {
                "field3" : {
                  "from" : "10",
                  "to" : null,
                  "include_lower" : true,
                  "include_upper" : true,
                  "boost" : 1.0
                }
              }
            },
            {
              "range" : {
                "field4" : {
                  "from" : "10",
                  "to" : null,
                  "include_lower" : true,
                  "include_upper" : true,
                  "boost" : 1.0
                }
              }
            },
            {
              "geo_shape" : {
                "location" : {
                  "shape" : {
                    "type" : "geometrycollection",
                    "geometries" : [
                      {
                        "type" : "multipolygon",
                        "orientation" : "right",
                        "coordinates" : [
                          [
                            // coords here
                          ]
                        ]
                      },
                      {
                        "type" : "polygon",
                        "orientation" : "right",
                        "coordinates" : [
                          [
                            // coords here
                          ]
                        ]
                      },
                      {
                        "type" : "polygon",
                        "orientation" : "right",
                        "coordinates" : [
                          [
                            // coords here
                          ]
                        ]
                      },
                      {
                        "type" : "polygon",
                        "orientation" : "right",
                        "coordinates" : [
                          [
                            // coords here
                          ]
                        ]
                      }
                    ]
                  },
                  "relation" : "intersects"
                },
                "ignore_unmapped" : false,
                "boost" : 1.0
              }
            }
          ]
        }
      },
      "boost" : 1.0
    }
  },
  "_source" : {
    "includes" : [
      "field1",
      "field2",
      "field3",
      "field4",
      "field8"
    ],
    "excludes" : [ ]
  },
  "sort" : [
    {
      "field1" : {
        "order" : "desc"
      }
    }
  ],
  "aggregations" : {
    "agg1" : {
      "terms" : {
        "field" : "field1",
        "size" : 10000,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      }
    },
    "agg2" : {
      "terms" : {
        "field" : "field2",
        "size" : 10000,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      }
    },
    "agg3" : {
      "terms" : {
        "field" : "field3",
        "size" : 10000,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      }
    },
    "agg4" : {
      "terms" : {
        "field" : "field4",
        "size" : 10000,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      }
    },
    "agg5" : {
      "terms" : {
        "field" : "field5",
        "size" : 10000,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      }
    },
    "agg6" : {
      "terms" : {
        "field" : "field6",
        "size" : 10000,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      }
    },
    "agg7" : {
      "terms" : {
        "field" : "field7",
        "size" : 10000,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      }
    },
    "agg8" : {
      "terms" : {
        "field" : "field8",
        "size" : 10000,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      }
    },
    "map_center" : {
      "geo_centroid" : {
        "field" : "locationPoint.coordinates"
      }
    },
    "map_bound" : {
      "geo_bounds" : {
        "field" : "locationPoint.coordinates",
        "wrap_longitude" : true
      }
    }
  }
}

请注意,该字段位置映射为geo_shape,字段location.coordinates映射为geo_point。

所以接下来就是问题所在。下面显示了请求的结果(命中数)。只有多边形正在发生变化。

#  Polygons                              Hits count

1) 1,2,3,4                               5565

2) 1                                     4897

3) 3,4                                   75

4) 2                                     9

5) 1,3,4                                 5543

6) 1,2                                   5466

7) 2,3,4                                 84

因此,如果我添加多边形1st的结果与2,3,4多边形,我将无法获得完整请求中的数字。

例如,#1!=#2 +#7,也是#1!=#5 +#4,但#7 ==#4 +#3

我无法理解这是请求中的问题还是预期的行为,甚至ES中的错误。

任何人都可以帮助我理解这种ES行为的逻辑或指向解决方案吗?

谢谢!

答案

在与Elasticsearch团队成员进行简短对话后,我们来到AWS。 AWS和纯ES的构建哈希不相等,因此AWS团队修改了ES,我们不知道确切的更改。可能会有一些更改可能会影响已发布问题中的搜索。在继续我们的对话之前,需要在纯ES群集上重现此行为。

以上是关于Elasticsearch geo搜索奇怪的行为的主要内容,如果未能解决你的问题,请参考以下文章

如何从 elasticsearch.net / NEST 获取 geo_point 字段的距离

Elasticsearch 基于地理位置的搜索查询

Elasticsearch 基于地理位置的搜索查询

Elasticsearch 基于地理位置的搜索查询

Elasticsearch:过滤具有空geo_point值的文档

使用 geo_distance 过滤器时,ElasticSearch 返回太远的项目