Elasticsearch geo搜索奇怪的行为
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Elasticsearch geo搜索奇怪的行为相关的知识,希望对你有一定的参考价值。
几天前,我面对Elasticsearch中地理搜索的奇怪行为。
我使用AWS托管的ES 5.5,显然是通过REST接口。
假设我们有200k个对象,其中位置信息仅表示为点。我使用地理搜索来查找多个多边形内的点。它们显示在下图中。从ES的最终请求中提取坐标。 该请求是使用官方Java高级REST客户端构建的。请求查询将在下面附上。
我想搜索至少一个多边形内的所有对象。这是查询(实际字段名称和值被存根替换,除了location和locationPoint.coordinates)
{
"size" : 20,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{
"terms" : {
"field1" : [
"a",
"b",
"c",
"d",
"e",
"f"
],
"boost" : 1.0
}
},
{
"term" : {
"field2" : {
"value" : "q",
"boost" : 1.0
}
}
},
{
"range" : {
"field3" : {
"from" : "10",
"to" : null,
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
},
{
"range" : {
"field4" : {
"from" : "10",
"to" : null,
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
},
{
"geo_shape" : {
"location" : {
"shape" : {
"type" : "geometrycollection",
"geometries" : [
{
"type" : "multipolygon",
"orientation" : "right",
"coordinates" : [
[
// coords here
]
]
},
{
"type" : "polygon",
"orientation" : "right",
"coordinates" : [
[
// coords here
]
]
},
{
"type" : "polygon",
"orientation" : "right",
"coordinates" : [
[
// coords here
]
]
},
{
"type" : "polygon",
"orientation" : "right",
"coordinates" : [
[
// coords here
]
]
}
]
},
"relation" : "intersects"
},
"ignore_unmapped" : false,
"boost" : 1.0
}
}
]
}
},
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"field1",
"field2",
"field3",
"field4",
"field8"
],
"excludes" : [ ]
},
"sort" : [
{
"field1" : {
"order" : "desc"
}
}
],
"aggregations" : {
"agg1" : {
"terms" : {
"field" : "field1",
"size" : 10000,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
},
"agg2" : {
"terms" : {
"field" : "field2",
"size" : 10000,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
},
"agg3" : {
"terms" : {
"field" : "field3",
"size" : 10000,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
},
"agg4" : {
"terms" : {
"field" : "field4",
"size" : 10000,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
},
"agg5" : {
"terms" : {
"field" : "field5",
"size" : 10000,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
},
"agg6" : {
"terms" : {
"field" : "field6",
"size" : 10000,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
},
"agg7" : {
"terms" : {
"field" : "field7",
"size" : 10000,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
},
"agg8" : {
"terms" : {
"field" : "field8",
"size" : 10000,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
},
"map_center" : {
"geo_centroid" : {
"field" : "locationPoint.coordinates"
}
},
"map_bound" : {
"geo_bounds" : {
"field" : "locationPoint.coordinates",
"wrap_longitude" : true
}
}
}
}
请注意,该字段位置映射为geo_shape,字段location.coordinates映射为geo_point。
所以接下来就是问题所在。下面显示了请求的结果(命中数)。只有多边形正在发生变化。
# Polygons Hits count
1) 1,2,3,4 5565
2) 1 4897
3) 3,4 75
4) 2 9
5) 1,3,4 5543
6) 1,2 5466
7) 2,3,4 84
因此,如果我添加多边形1st的结果与2,3,4多边形,我将无法获得完整请求中的数字。
例如,#1!=#2 +#7,也是#1!=#5 +#4,但#7 ==#4 +#3
我无法理解这是请求中的问题还是预期的行为,甚至ES中的错误。
任何人都可以帮助我理解这种ES行为的逻辑或指向解决方案吗?
谢谢!
在与Elasticsearch团队成员进行简短对话后,我们来到AWS。 AWS和纯ES的构建哈希不相等,因此AWS团队修改了ES,我们不知道确切的更改。可能会有一些更改可能会影响已发布问题中的搜索。在继续我们的对话之前,需要在纯ES群集上重现此行为。
以上是关于Elasticsearch geo搜索奇怪的行为的主要内容,如果未能解决你的问题,请参考以下文章
如何从 elasticsearch.net / NEST 获取 geo_point 字段的距离