对 Elasticsearch 字段进行汇总和计数

Posted

技术标签:

【中文标题】对 Elasticsearch 字段进行汇总和计数【英文标题】:Sum and count aggregations over Elasticsearch fields 【发布时间】:2018-06-26 19:32:42 【问题描述】:

我是 Elasticsearch 的新手,我希望对 Elasticsearch 5.x 索引中的字段执行某些聚合。我有一个索引,其中包含具有字段langs(具有嵌套结构)和docLang 的文档。这些是动态映射的字段。以下是示例文档

文档 1:


   "_index":"A",
   "_type":"document",
   "_id":"1",
   "_source":
      "text":"This is a test sentence.",
      "langs":
         "X":
            "en":1,
            "es":2,
            "zh":3
         ,
        "Y":
            "en":4,
            "es":5,
            "zh":6
          
      ,
      "docLang": "en"
   

文档 2:


   "_index":"A",
   "_type":"document",
   "_id":"2",
   "_source":
      "text":"This is a test sentence.",
      "langs":
         "X":
            "en":1,
            "es":2
         ,
         "Y":
            "en":3,
            "es":4
          
      ,
      "docLang": "es"
   

文档 3:


   "_index":"A",
   "_type":"document",
   "_id":"2",
   "_source":
      "text":"This is a test sentence.",
      "langs":
         "X":
            "en":1
         ,
         "Y":
            "en":2
          
      ,
      "docLang": "en"
   

我想对 langs 字段执行求和聚合,这样对于每个键 (X/Y) 和每种语言,我都可以获得索引中所有文档的总和。另外,我想从docLang 字段中生成每种语言的文档数。

例如:对于以上 3 个文档,langs 字段的总和聚合如下所示:

"langs":  
      "X":  
         "en":3,
         "es":4,
         "zh":3
      ,
      "Y":  
         "en":9,
         "es":9,
         "zh":6
      
   

docLang 计数如下所示:

 "docLang":
    "en" : 2,
    "es" : 1
   

另外,由于一些生产环境限制,我无法在 Elasticsearch 中使用脚本。所以,我想知道是否可以对上述字段仅使用 field 聚合类型?

【问题讨论】:

【参考方案1】:

  "size": 0,
  "aggs": 
    "X": 
      "nested": 
        "path": "langs.X"
      ,
      "aggs": 
        "X_sum_en": 
          "sum": 
            "field": "langs.X.en"
          
        ,
        "X_sum_es": 
          "sum": 
            "field": "langs.X.es"
          
        ,
        "X_sum_zh": 
          "sum": 
            "field": "langs.X.zh"
          
        
      
    ,
    "Y": 
      "nested": 
        "path": "langs.Y"
      ,
      "aggs": 
        "Y_sum_en": 
          "sum": 
            "field": "langs.Y.en"
          
        ,
        "Y_sum_es": 
          "sum": 
            "field": "langs.Y.es"
          
        ,
        "Y_sum_zh": 
          "sum": 
            "field": "langs.Y.zh"
          
        
      
    ,
    "sum_docLang": 
      "terms": 
        "field": "docLang.keyword",
        "size": 10
      
    
  

因为你没有提到,但我认为这很重要。我将XY 设为nested 字段:

    "langs": 
      "properties": 
        "X": 
          "type": "nested",
          "properties": 
            "en": 
              "type": "long"
            ,
            "es": 
              "type": "long"
            ,
            "zh": 
              "type": "long"
            
          
        ,
        "Y": 
          "type": "nested",
          "properties": 
            "en": 
              "type": "long"
            ,
            "es": 
              "type": "long"
            ,
            "zh": 
              "type": "long"
            
          
        
      
    

但是,如果您的字段根本不是 nested,而这里我的意思实际上是 Elasticsearch 中的 nested 字段类型,那么像这样的简单聚合就足够了:


  "size": 0,
  "aggs": 
    "X_sum_en": 
      "sum": 
        "field": "langs.X.en"
      
    ,
    "X_sum_es": 
      "sum": 
        "field": "langs.X.es"
      
    ,
    "X_sum_zh": 
      "sum": 
        "field": "langs.X.zh"
      
    ,
    "Y_sum_en": 
      "sum": 
        "field": "langs.Y.en"
      
    ,
    "Y_sum_es": 
      "sum": 
        "field": "langs.Y.es"
      
    ,
    "Y_sum_zh": 
      "sum": 
        "field": "langs.Y.zh"
      
    ,
    "sum_docLang": 
      "terms": 
        "field": "docLang.keyword",
        "size": 10
      
    
  

【讨论】:

以上是关于对 Elasticsearch 字段进行汇总和计数的主要内容,如果未能解决你的问题,请参考以下文章

Elasticsearch:计数分词中的 token

Elasticsearch:计数分词中的 token

哪个更快,水平计数还是垂直计数?

如何通过Elasticsearch 6.x中的动态或未知字段进行聚合

即使字段的映射位于`text`和`keyword`类型上,也如何对sum和avg`进行聚合?

利用kibana插件对Elasticsearch进行映射