在elasticsearch中,如何在嵌套数组中按值分组
Posted
技术标签:
【中文标题】在elasticsearch中,如何在嵌套数组中按值分组【英文标题】:In elasticsearch, how to group by value inside nested array 【发布时间】:2016-04-21 14:30:14 【问题描述】:说,我有以下文件:
第一个文档:
productName: "product1",
tags: [
"name":"key1",
"value":"value1"
,
"name":"key2",
"value":"value2"
]
第二个文档:
productName: "product2",
tags: [
"name":"key1",
"value":"value1"
,
"name":"key2",
"value":"value3"
]
我知道如果我想按 productName 分组,我可以使用 terms
聚合
"terms":
"field": "productName"
这将给我两个带有两个不同键“product1”、“product2”的存储桶。
但是,如果我想按标签键分组,查询应该是什么?即我想用 name==key1 按标签分组,然后我期待一个带有 key="value1" 的存储桶;而如果我使用名称==key2 按标签分组,我希望结果是两个带有键“value2”、“value3”的存储桶。
如果我想按嵌套数组中的“值”分组但不按“键”分组,查询应该是什么样子?有什么建议吗?
【问题讨论】:
【参考方案1】:听起来nested 术语聚合正是您要寻找的。p>
根据您发布的两个文档,此查询:
POST /test_index/_search
"size": 0,
"aggs":
"product_name_terms":
"terms":
"field": "product_name"
,
"nested_tags":
"nested":
"path": "tags"
,
"aggs":
"tags_name_terms":
"terms":
"field": "tags.name"
,
"tags_value_terms":
"terms":
"field": "tags.value"
返回这个:
"took": 67,
"timed_out": false,
"_shards":
"total": 5,
"successful": 5,
"failed": 0
,
"hits":
"total": 2,
"max_score": 0,
"hits": []
,
"aggregations":
"product_name_terms":
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
,
"nested_tags":
"doc_count": 4,
"tags_name_terms":
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "key1",
"doc_count": 2
,
"key": "key2",
"doc_count": 2
]
,
"tags_value_terms":
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "value1",
"doc_count": 2
,
"key": "value2",
"doc_count": 1
,
"key": "value3",
"doc_count": 1
]
这是我用来测试它的一些代码:
http://sense.qbox.io/gist/a9a172f41dbd520d5e61063a9686055681110522
编辑:按嵌套值过滤
根据您的评论,如果您想按(嵌套结果的)值过滤嵌套结果,您可以使用 filter aggregation 添加另一个聚合“层”,如下所示:
POST /test_index/_search
"size": 0,
"aggs":
"nested_tags":
"nested":
"path": "tags"
,
"aggs":
"filter_tag_name":
"filter":
"term":
"tags.name": "key1"
,
"aggs":
"tags_name_terms":
"terms":
"field": "tags.name"
,
"tags_value_terms":
"terms":
"field": "tags.value"
返回:
"took": 10,
"timed_out": false,
"_shards":
"total": 5,
"successful": 5,
"failed": 0
,
"hits":
"total": 2,
"max_score": 0,
"hits": []
,
"aggregations":
"nested_tags":
"doc_count": 4,
"filter_tag_name":
"doc_count": 2,
"tags_name_terms":
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "key1",
"doc_count": 2
]
,
"tags_value_terms":
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "value1",
"doc_count": 2
]
这是更新后的代码:
http://sense.qbox.io/gist/507c3aabf36b8f6ed8bb076c8c1b8552097c5458
【讨论】:
谢谢,但如果我只想按仅与一个标签键关联的值进行分组怎么办?所以我期待某种方式可以让我按仅与 tags.name==key2 关联的 tags.value 进行分组,而不返回与 key1 的值关联的任何存储桶。有没有这样的方法?以上是关于在elasticsearch中,如何在嵌套数组中按值分组的主要内容,如果未能解决你的问题,请参考以下文章
如何在presto elasticsearch中按子句pushdpown order