在elasticsearch中,如何在嵌套数组中按值分组

Posted

技术标签:

【中文标题】在elasticsearch中,如何在嵌套数组中按值分组【英文标题】:In elasticsearch, how to group by value inside nested array 【发布时间】:2016-04-21 14:30:14 【问题描述】:

说,我有以下文件:

第一个文档:


  productName: "product1",
  tags: [
    
      "name":"key1",
      "value":"value1"
    ,
    
      "name":"key2",
      "value":"value2"
    
  ]

第二个文档:


  productName: "product2",
  tags: [
    
      "name":"key1",
      "value":"value1"
    ,
    
      "name":"key2",
      "value":"value3"
    
  ]

我知道如果我想按 productName 分组,我可以使用 terms 聚合

"terms": 
    "field": "productName"

这将给我两个带有两个不同键“product1”、“product2”的存储桶。

但是,如果我想按标签键分组,查询应该是什么?即我想用 name==key1 按标签分组,然后我期待一个带有 key="value1" 的存储桶;而如果我使用名称==key2 按标签分组,我希望结果是两个带有键“value2”、“value3”的存储桶。

如果我想按嵌套数组中的“值”分组但不按“键”分组,查询应该是什么样子?有什么建议吗?

【问题讨论】:

【参考方案1】:

听起来nested 术语聚合正是您要寻找的。​​p>

根据您发布的两个文档,此查询:

POST /test_index/_search

   "size": 0,
   "aggs": 
      "product_name_terms": 
         "terms": 
            "field": "product_name"
         
      ,
      "nested_tags": 
         "nested": 
            "path": "tags"
         ,
         "aggs": 
            "tags_name_terms": 
               "terms": 
                  "field": "tags.name"
               
            ,
            "tags_value_terms": 
               "terms": 
                  "field": "tags.value"
               
            
         
      
   

返回这个:


   "took": 67,
   "timed_out": false,
   "_shards": 
      "total": 5,
      "successful": 5,
      "failed": 0
   ,
   "hits": 
      "total": 2,
      "max_score": 0,
      "hits": []
   ,
   "aggregations": 
      "product_name_terms": 
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": []
      ,
      "nested_tags": 
         "doc_count": 4,
         "tags_name_terms": 
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               
                  "key": "key1",
                  "doc_count": 2
               ,
               
                  "key": "key2",
                  "doc_count": 2
               
            ]
         ,
         "tags_value_terms": 
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               
                  "key": "value1",
                  "doc_count": 2
               ,
               
                  "key": "value2",
                  "doc_count": 1
               ,
               
                  "key": "value3",
                  "doc_count": 1
               
            ]
         
      
   

这是我用来测试它的一些代码:

http://sense.qbox.io/gist/a9a172f41dbd520d5e61063a9686055681110522

编辑:按嵌套值过滤

根据您的评论,如果您想按(嵌套结果的)值过滤嵌套结果,您可以使用 filter aggregation 添加另一个聚合“层”,如下所示:

POST /test_index/_search

   "size": 0,
   "aggs": 
      "nested_tags": 
         "nested": 
            "path": "tags"
         ,
         "aggs": 
            "filter_tag_name": 
               "filter": 
                  "term": 
                     "tags.name": "key1"
                  
               ,
               "aggs": 
                  "tags_name_terms": 
                     "terms": 
                        "field": "tags.name"
                     
                  ,
                  "tags_value_terms": 
                     "terms": 
                        "field": "tags.value"
                     
                  
               
            
         
      
   

返回:


   "took": 10,
   "timed_out": false,
   "_shards": 
      "total": 5,
      "successful": 5,
      "failed": 0
   ,
   "hits": 
      "total": 2,
      "max_score": 0,
      "hits": []
   ,
   "aggregations": 
      "nested_tags": 
         "doc_count": 4,
         "filter_tag_name": 
            "doc_count": 2,
            "tags_name_terms": 
               "doc_count_error_upper_bound": 0,
               "sum_other_doc_count": 0,
               "buckets": [
                  
                     "key": "key1",
                     "doc_count": 2
                  
               ]
            ,
            "tags_value_terms": 
               "doc_count_error_upper_bound": 0,
               "sum_other_doc_count": 0,
               "buckets": [
                  
                     "key": "value1",
                     "doc_count": 2
                  
               ]
            
         
      
   

这是更新后的代码:

http://sense.qbox.io/gist/507c3aabf36b8f6ed8bb076c8c1b8552097c5458

【讨论】:

谢谢,但如果我只想按仅与一个标签键关联的值进行分组怎么办?所以我期待某种方式可以让我按仅与 tags.name==key2 关联的 tags.value 进行分组,而不返回与 key1 的值关联的任何存储桶。有没有这样的方法?

以上是关于在elasticsearch中,如何在嵌套数组中按值分组的主要内容,如果未能解决你的问题,请参考以下文章

如何在Postgresql中按嵌套数组的重复值分组?

javascript在嵌套对象/数组中按值查找

如何在presto elasticsearch中按子句pushdpown order

Elasticsearch如何管理 Elasticsearch 文档中的嵌套对象

如何在 Laravel 的嵌套 2 级关系中按列排序?

如何在嵌套字典中按元素访问熊猫多索引?