ES聚合分析(聚合分析简介指标聚合桶聚合)
Posted wangzhuxing
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ES聚合分析(聚合分析简介指标聚合桶聚合)相关的知识,希望对你有一定的参考价值。
一、聚合分析简介
1. ES聚合分析是什么?
聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。
对一个数据集求最大、最小、和、平均值等指标的聚合,在ES中称为指标聚合 metric
而关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组group by,再在组上进行指标聚合。在 ES 中group by 称为分桶,桶聚合 bucketing
ES中还提供了矩阵聚合(matrix)、管道聚合(pipleline),但还在完善中。
2. ES聚合分析查询的写法
在查询请求体中以aggregations节点按如下语法定义聚合分析:
"aggregations" : { "<aggregation_name>" : { <!--聚合的名字 --> "<aggregation_type>" : { <!--聚合的类型 --> <aggregation_body> <!--聚合体:对哪些字段进行聚合 --> } [,"meta" : { [<meta_data_body>] } ]? <!--元 --> [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 --> } [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 --> }
说明:
aggregations 也可简写为 aggs
3. 聚合分析的值来源
聚合计算的值可以取字段的值,也可是脚本计算的结果。
二、指标聚合
1. max min sum avg
示例1:查询所有记录中年龄的最大值
POST /book1/_search?pretty { "size": 0, "aggs": { "maxage": { "max": { "field": "age" } } } }
结果1:
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "maxage": { "value": 54 } } }
示例2:加上查询条件,查询名字包含‘test‘的年龄最大值:
POST /book1/_search?pretty { "query":{ "term":{ "name":"test" } }, "size": 2, "sort": [ { "age": { "order": "desc" } } ], "aggs": { "maxage": { "max": { "field": "age" } } } }
结果2:
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 5, "max_score": null, "hits": [ { "_index": "book1", "_type": "english", "_id": "6IUkUmUBRzBxBrDgFok2", "_score": null, "_source": { "name": "test goog my money", "age": [ 14, 54, 45, 34 ], "class": "dsfdsf", "addr": "中国" }, "sort": [ 54 ] }, { "_index": "book1", "_type": "english", "_id": "54UiUmUBRzBxBrDgfIl9", "_score": null, "_source": { "name": "test goog my money", "age": [ 11, 13, 14 ], "class": "dsfdsf", "addr": "中国" }, "sort": [ 14 ] } ] }, "aggregations": { "maxage": { "value": 54 } } }
示例3:值来源于脚本,查询所有记录的平均年龄是多少,并对平均年龄加10
POST /book1/_search?pretty { "size":0, "aggs": { "avg_age": { "avg": { "script": { "source": "doc.age.value" } } }, "avg_age10": { "avg": { "script": { "source": "doc.age.value + 10" } } } } }
结果3:
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "avg_age": { "value": 7.585365853658536 }, "avg_age10": { "value": 17.585365853658537 } } }
示例4:指定field,在脚本中用_value 取字段的值
POST /book1/_search?pretty { "size":0, "aggs": { "sun_age": { "sum": { "field":"age", "script": { "source": "_value * 2" } } } } }
结果4:
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "sun_age": { "value": 942 } } }
示例5:为没有值字段指定值。如未指定,缺失该字段值的文档将被忽略:
POST /book1/_search?pretty { "size":0, "aggs": { "sun_age": { "avg": { "field":"age", "missing":15 } } } }
结果5:
{ "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "sun_age": { "value": 12.847826086956522 } } }
2. 文档计数 count
示例1:统计银行索引book下年龄为12的文档数量
POST book1/english/_count { "query":{ "match":{ "age":12 } } }
结果1:
{ "count": 16, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 } }
3. Value count 统计某字段有值的文档数
示例1:
POST /book1/_search?size=0 { "aggs":{ "age_count":{ "value_count":{ "field":"age" } } } }
结果1:
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_count": { "value": 38 } } }
4. cardinality 值去重计数
示例1:
POST /book1/_search?size=0 { "aggs":{ "age_count":{ "value_count":{ "field":"age" } }, "name_count":{ "cardinality":{ "field":"age" } } } }
结果1:
{ "took": 16, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "name_count": { "value": 11 }, "age_count": { "value": 38 } } }
说明:有值的38个,去掉重复的之后以一共有11个。
5. stats 统计 count max min avg sum 5个值
示例1:
POST /book1/_search?size=0 { "aggs":{ "age_count":{ "stats":{ "field":"age" } } } }
结果1:
{ "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_count": { "count": 38, "min": 1, "max": 54, "avg": 12.394736842105264, "sum": 471 } } }
6. Extended stats
高级统计,比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间。
示例1:
POST /book1/_search?size=0 { "aggs":{ "age_stats":{ "extended_stats":{ "field":"age" } } } }
结果1:
{ "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_stats": { "count": 38, "min": 1, "max": 54, "avg": 12.394736842105264, "sum": 471, "sum_of_squares": 11049, "variance": 137.13365650969527, "std_deviation": 11.710408041981085, "std_deviation_bounds": { "upper": 35.81555292606743, "lower": -11.026079241856905 } } } }
7. Percentiles 占比百分位对应的值统计
示例1:
对指定字段(脚本)的值按从小到大累计每个值对应的文档数的占比(占所有命中文档数的百分比),返回指定占比比例对应的值。默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。如下中间的结果,可以理解为:占比为50%的文档的age值 <= 12,或反过来:age<=12的文档数占总命中文档数的50%。
POST /book1/_search?size=0 { "aggs":{ "age_percentiles":{ "percentiles":{ "field":"age" } } } }
结果1:
{ "took": 16, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_percentiles": { "values": { "1.0": 1, "5.0": 1, "25.0": 1, "50.0": 12, "75.0": 13, "95.0": 40.600000000000016, "99.0": 54 } } } }
示例2:指定分位值(占比50%,96%,99%的范围值分别是多少)
POST /book1/_search?size=0 { "aggs":{ "age_percentiles":{ "percentiles":{ "field":"age", "percents" : [50,96,99] } } } }
结果2:
{ "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_percentiles": { "values": { "50.0": 12, "96.0": 44.779999999999966, "99.0": 54 } } } }
说明:50%的数值<= 12, 96%的数值<= 96%, 99%的数值<= 54
以上是关于ES聚合分析(聚合分析简介指标聚合桶聚合)的主要内容,如果未能解决你的问题,请参考以下文章