ES聚合分析(聚合分析简介指标聚合桶聚合)

Posted wangzhuxing

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ES聚合分析(聚合分析简介指标聚合桶聚合)相关的知识,希望对你有一定的参考价值。

一、聚合分析简介

1. ES聚合分析是什么?

聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。

对一个数据集求最大、最小、和、平均值等指标的聚合,在ES中称为指标聚合   metric

而关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组group by,再在组上进行指标聚合。在 ES 中group by 称为分桶桶聚合 bucketing

ES中还提供了矩阵聚合(matrix)、管道聚合(pipleline),但还在完善中。 

2. ES聚合分析查询的写法

在查询请求体中以aggregations节点按如下语法定义聚合分析:

"aggregations" : {
    "<aggregation_name>" : { <!--聚合的名字 -->
        "<aggregation_type>" : { <!--聚合的类型 -->
            <aggregation_body> <!--聚合体:对哪些字段进行聚合 -->
        }
        [,"meta" : {  [<meta_data_body>] } ]? <!--元 -->
        [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 -->
    }
    [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
}

 说明:

aggregations 也可简写为 aggs

3. 聚合分析的值来源

聚合计算的值可以取字段的值,也可是脚本计算的结果

二、指标聚合

1. max min sum avg

示例1:查询所有记录中年龄的最大值

POST /book1/_search?pretty

{
  "size": 0, 
  "aggs": {
    "maxage": {
      "max": {
        "field": "age"
      }
    }
  }
}

结果1:

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "maxage": {
            "value": 54
        }
    }
}

示例2:加上查询条件,查询名字包含‘test‘的年龄最大值:

POST /book1/_search?pretty

{
  "query":{
     "term":{
         "name":"test"
     }    
  },
  "size": 2, 
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ],
  "aggs": {
    "maxage": {
      "max": {
        "field": "age"
      }
    }
  }
}

结果2:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": null,
        "hits": [
            {
                "_index": "book1",
                "_type": "english",
                "_id": "6IUkUmUBRzBxBrDgFok2",
                "_score": null,
                "_source": {
                    "name": "test goog my money",
                    "age": [
                        14,
                        54,
                        45,
                        34
                    ],
                    "class": "dsfdsf",
                    "addr": "中国"
                },
                "sort": [
                    54
                ]
            },
            {
                "_index": "book1",
                "_type": "english",
                "_id": "54UiUmUBRzBxBrDgfIl9",
                "_score": null,
                "_source": {
                    "name": "test goog my money",
                    "age": [
                        11,
                        13,
                        14
                    ],
                    "class": "dsfdsf",
                    "addr": "中国"
                },
                "sort": [
                    14
                ]
            }
        ]
    },
    "aggregations": {
        "maxage": {
            "value": 54
        }
    }
}

 示例3:值来源于脚本,查询所有记录的平均年龄是多少,并对平均年龄加10

POST /book1/_search?pretty
{
  "size":0,
  "aggs": {
    "avg_age": {
      "avg": {
        "script": {
          "source": "doc.age.value"
        }
      }
    },
    "avg_age10": {
      "avg": {
        "script": {
          "source": "doc.age.value + 10"
        }
      }
    }
  }
}

结果3:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "avg_age": {
            "value": 7.585365853658536
        },
        "avg_age10": {
            "value": 17.585365853658537
        }
    }
}

 示例4:指定field,在脚本中用_value 取字段的值

POST  /book1/_search?pretty
{
  "size":0,
  "aggs": {
    "sun_age": {
      "sum": {
          "field":"age",
        "script": {
          "source": "_value * 2"
        }
      }
    }
  }
}

结果4:

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "sun_age": {
            "value": 942
        }
    }
}

示例5:为没有值字段指定值。如未指定,缺失该字段值的文档将被忽略:

POST /book1/_search?pretty

{
  "size":0,
  "aggs": {
    "sun_age": {
      "avg": {
          "field":"age",
        "missing":15
      }
    }
  }
}

结果5:

{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "sun_age": {
            "value": 12.847826086956522
        }
    }
}

2. 文档计数 count

 示例1:统计银行索引book下年龄为12的文档数量

POST book1/english/_count
{
    "query":{
        "match":{
            "age":12
        }
    }
}

结果1:

{
    "count": 16,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    }
}

3. Value count 统计某字段有值的文档数

示例1:

POST /book1/_search?size=0
{
    "aggs":{
        "age_count":{
            "value_count":{
                "field":"age"
            }
            
        }
    }
}

结果1:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_count": {
            "value": 38
        }
    }
}

4. cardinality  值去重计数

示例1:

POST  /book1/_search?size=0
{
    "aggs":{
        "age_count":{
            "value_count":{
                "field":"age"
            }
            
        },
        "name_count":{
            "cardinality":{
                "field":"age"
            }
        }
    }
}

结果1:

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "name_count": {
            "value": 11
        },
        "age_count": {
            "value": 38
        }
    }
}

说明:有值的38个,去掉重复的之后以一共有11个。

5. stats 统计 count max min avg sum 5个值

示例1:

POST  /book1/_search?size=0
{
    "aggs":{
        "age_count":{
            "stats":{
                "field":"age"
            }
            
        }
    }
}

结果1:

{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_count": {
            "count": 38,
            "min": 1,
            "max": 54,
            "avg": 12.394736842105264,
            "sum": 471
        }
    }
}

6. Extended stats

高级统计,比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间。

示例1:

 

POST /book1/_search?size=0

{
    "aggs":{
        "age_stats":{
            "extended_stats":{
                "field":"age"
            }
            
        }
    }
}

 

结果1:

{
    "took": 8,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_stats": {
            "count": 38,
            "min": 1,
            "max": 54,
            "avg": 12.394736842105264,
            "sum": 471,
            "sum_of_squares": 11049,
            "variance": 137.13365650969527,
            "std_deviation": 11.710408041981085,
            "std_deviation_bounds": {
                "upper": 35.81555292606743,
                "lower": -11.026079241856905
            }
        }
    }
}

7. Percentiles 占比百分位对应的值统计

示例1:

对指定字段(脚本)的值按从小到大累计每个值对应的文档数的占比(占所有命中文档数的百分比),返回指定占比比例对应的值。默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。如下中间的结果,可以理解为:占比为50%的文档的age值 <= 12,或反过来:age<=12的文档数占总命中文档数的50%。

 

POST /book1/_search?size=0
{
    "aggs":{
        "age_percentiles":{
            "percentiles":{
                "field":"age"
            }
            
        }
    }
}

结果1:

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_percentiles": {
            "values": {
                "1.0": 1,
                "5.0": 1,
                "25.0": 1,
                "50.0": 12,
                "75.0": 13,
                "95.0": 40.600000000000016,
                "99.0": 54
            }
        }
    }
}

 示例2:指定分位值(占比50%,96%,99%的范围值分别是多少

POST /book1/_search?size=0
{
    "aggs":{
        "age_percentiles":{
            "percentiles":{
                "field":"age",
                "percents" : [50,96,99]
            }
            
        }
    }
}

结果2:

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_percentiles": {
            "values": {
                "50.0": 12,
                "96.0": 44.779999999999966,
                "99.0": 54
            }
        }
    }
}

说明:50%的数值<= 12, 96%的数值<= 96%, 99%的数值<= 54

 

以上是关于ES聚合分析(聚合分析简介指标聚合桶聚合)的主要内容,如果未能解决你的问题,请参考以下文章

ES 聚合索引简介

es笔记七之聚合操作之桶聚合和矩阵聚合

es 基础聚合

Elasticsearch 聚合后排序 --- 2022-04-03

ES聚合之Pipeline聚合语法讲解

ElasticSearch聚合