ElasticSearch6.x版本聚合统计在Kibana上的实操和在SpringBoot上的实操
Posted 这里是杨杨吖
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ElasticSearch6.x版本聚合统计在Kibana上的实操和在SpringBoot上的实操相关的知识,希望对你有一定的参考价值。
文章目录
一、教学讲解视频
教学讲解视频地址:视频地址
二、Kibana上操作
1.直方图聚合统计
①概念
直方图聚合是一种基于多桶值聚合
,可从文档中提取的数值或数值范围值来进行聚合。它可以对参与聚合的值来动态的生成固定大小的桶。
②代码
“size”: 0 作用是: 不显示具体的内容
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_histogram":
"histogram":
"field": "area",
"interval": 5
2.强制直方图聚合统计范围
①概念
使用extended_bounds
可以进行强制指定直方图的统计起止范围。
②代码
注意:extended_bounds
不会过滤分组,即使分组的值超过了extended_bounds
的范围区间,也依然会显示。
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_histogram":
"histogram":
"field": "area",
"interval": 5,
"extended_bounds":
"min": 0,
"max": 10
3.直方图聚合统计排序
①概念
在直方图聚合统计之后,我们可以对统计结果进行排序,排序分两种,一个根据统计后的key值进行排序。另一种就是根据统计后的count值进行排序。
②代码
根据统计后的key值降序排序
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_histogram":
"histogram":
"field": "area",
"interval": 5,
"order":
"_key": "desc"
根据统计后的count值降序排序
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_histogram":
"histogram":
"field": "area",
"interval": 5,
"order":
"_count": "desc"
4.直方图聚合统计偏移
①概念
分组默认从0开始以interval为间隔步进,可以通过offset修改分组的开始位置,这样就可以从offset开始以interval为间隔步进
。
②代码
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_histogram":
"histogram":
"field": "area",
"interval": 5,
"offset": 3
5.日期直方图聚合统计
①概念
interval(时间间隔)的可用表达式:
- year
1y
年 - quarter
1q
季度 - month
1M
月份 - week
1w
星期 - day
1d
天 - hour
1h
小时 - minute
1m
分钟 - second
1s
秒
②代码
把符合记录的最早时间和最晚时间为范围,根据每月1号(1M
)为区间对范围进行切分
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_date_histogram":
"date_histogram":
"field": "updateTime",
"interval": "1M",
"format": "yyyy-MM-dd HH:mm:ss"
6.日期范围聚合统计
①概念
通过range
查询某个日期范围内的数据个数统计。
②代码
/M
精确到月份
now
当前时间
time_zone
设置时区,默认是UTC,也就是现在时间会相差8h
假设现在(now)是2023年1月6日,这里就是查询日期范围为[2022-11-01, 2022-12-06]之间的数据个数。
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_date_range":
"date_range":
"field": "updateTime",
"format": "yyyy-MM-dd",
"time_zone": "+08:00",
"ranges": [
"from": "now-2M/M",
"to": "now-1M"
]
7.过滤、多重过滤直方图聚合统计
①概念
使用filter
在聚合统计前过滤出符合条件的数据。
②代码
先过滤出money字段值大于2500的数据,然后对过滤出的数据进行interval
为10的区间范围大小进行聚合统计。
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_filter":
"aggs":
"test_histogram":
"histogram":
"field": "area",
"interval": 10
,
"filter":
"range":
"money":
"gte": 2500
多重过滤
的话,我们可以使用filters
实现。
注意:多重过滤是分别过滤然后分别聚合统计。
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_filter":
"aggs":
"test_histogram":
"histogram":
"field": "area",
"interval": 10
,
"filters":
"filters":
"test_range1":
"range":
"money":
"gte": 2800
,
"test_range2" :
"range":
"id":
"lte": 5
8.空值聚合统计
①概念
使用missing
可以把指定字段为null的数据统计出来。
②代码
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"test_missing":
"missing":
"field": "user.password"
9.平均、最大、最小和求和的聚合统计
①概念
使用avg
max
min
sum
可以对某字段分别进行求平均、求最大、求最小和求和的操作。
②代码
平均
GET /online_house_achieve/house/_search
"size": 0,
"aggs" :
"avg_grade" : "avg" : "field" : "area"
最大
GET /online_house_achieve/house/_search
"size": 0,
"aggs" :
"max_grade" : "max" : "field" : "area"
最小
GET /online_house_achieve/house/_search
"size": 0,
"aggs" :
"min_grade" : "min" : "field" : "area"
求和
GET /online_house_achieve/house/_search
"size": 0,
"aggs" :
"sum_grade" : "sum" : "field" : "area"
一次性查询最大、最小、平均、求和
写法一
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"max_area":
"max":
"field":"area"
,
"min_area":
"min":
"field":"area"
,
"avg_area":
"avg":
"field":"area"
,
"sum_area":
"sum":
"field":"area"
写法二
GET /online_house_achieve/house/_search
"size":0,
"aggs":
"area_stats":
"stats":
"field":"area"
10.去重聚合统计
①概念
area_count
取名,随便定义即可。
使用cardinality
即可实现去重。
②代码
GET /online_house_achieve/house/_search
"size":0,
"aggs":
"area_count":
"cardinality":
"field": "area"
11.分组聚合统计
①概念
根据category字段分组,然后求每个分组的area字段的平均值
②代码
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"group_search":
"terms":
"field":"category"
,
"aggs":
"avg_area":
"avg":
"field":"area"
12.分组聚合统计之取Top N
①概念
使用top_hits
可以对分组后的结果,取前N条数据。
②代码
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"group_search":
"terms":
"field":"category"
,
"aggs":
"top_area_hits":
"top_hits":
"sort":[
"area":
"order":"desc"
],
"_source":
"includes":["area","id","category"]
,
"size": 3
13.百分位聚合统计
①概念
百分位数(percentiles)
:统计学术语,如果将一组数据从小到大排序,并计算相应的累计百分位,则某一百分位所对应数据的值就称为这一百分位的百分位数。可表示为:一组n个观测值按数值大小排列。如,处于p%位置的值称第p百分位数。
举个栗子~
今年是单位体检的日子,所有人到体检中心来测身高。
单位一共100个人,所有人的身高都在 163cm-190cm 之间。同事 A 的身高值是180cm处于百分位70%。也就是说第70百分位数为180cm。
那么我们就可以理解,同事 A 的身高比公司中70%的人都高,比其他30%的人要矮。
②代码
这里默认展示1%、5%、25%、50%、75%、95%、99%对应的具体数值
GET /online_house_achieve/house/_search
"size":0,
"aggs":
"area_percentiles":
"percentiles":
"field": "area",
"keyed": false
当然,如果不想用系统默认的百分位,也可以自己指定~
指定值去查百分位占比
GET /online_house_achieve/house/_search
"size": 0,
"aggs":
"area_percentiles":
"percentile_ranks":
"field": "area",
"values": [22, 50, 73],
"keyed": false
指定百分位占比去查对应值
这里也相当于是在查中位数
GET /online_house_achieve/house/_search
"size":0,
"aggs":
"area_percentiles":
"percentiles":
"field":"area",
"percents":[50],
"keyed":false
三、SpringBoot上操作
以下在SpringBoot操作的代码都是根据上面的Kibana操作的代码转换而来
1.直方图聚合统计
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
.histogram("test_histogram")
.field("area")
.interval(5);
searchSourceBuilder.aggregation(aggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
parsedHistogram.getBuckets().forEach(bucket ->
logger.info("key: ", bucket.getKey()); //阶梯值
logger.info("count: ", bucket.getDocCount()); //获取总数
);
2.强制直方图聚合统计范围
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
.histogram("test_histogram")
.field("area")
.interval(5)
.extendedBounds(0, 10);
searchSourceBuilder.aggregation(aggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
parsedHistogram.getBuckets().forEach(bucket ->
logger.info("key: ", bucket.getKey()); //阶梯值
logger.info("count: ", bucket.getDocCount()); //获取总数
);
3.直方图聚合统计排序
根据统计后的key值降序排序
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
.histogram("test_histogram")
.field("area")
.interval(5)
.order(BucketOrder.aggregation("_key", false));
searchSourceBuilder.aggregation(aggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
parsedHistogram.getBuckets().forEach(bucket ->
logger.info("key: ", bucket.getKey()); //阶梯值
logger.info("count: ", bucket.getDocCount()); //获取总数
);
根据统计后的count值降序排序
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
.histogram("test_histogram")
.field("area")
.interval(5)
.order(BucketOrder.aggregation("_count", false));
searchSourceBuilder.aggregation(aggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
parsedHistogram.getBuckets().forEach(bucket ->
logger.info("key: ", bucket.getKey()); //阶梯值
logger.info("count: ", bucket.getDocCount()); //获取总数
);
4.直方图聚合统计偏移
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
.histogram("test_histogram")
.field("area")
.interval(5)
.offset(3);
searchSourceBuilder.aggregation(aggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
parsedHistogram.getBuckets().forEach(bucket ->
logger.info("key: ", bucket.getKey()); //阶梯值
logger.info("count: ", bucket.getDocCount()); //获取总数
);
5.日期直方图聚合统计
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
DateHistogramAggregationBuilder dateHistogramAggregationBuilder = AggregationBuilders
.dateHistogram("test_date_histogram")
.field("updateTime")
.dateHistogramInterval(DateHistogramInterval.MONTH)
.format("yyyy-MM-dd HH:mm:ss");
searchSourceBuilder.aggregation(dateHistogramAggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedDateHistogram parsedDateHistogram = aggregations.get("test_date_histogram");
parsedDateHistogram.getBuckets().forEach(bucket ->
logger.info("keyString: ", bucket.getKeyAsString()); //字符串阶梯值
logger.info("key: ", bucket.getKey(elasticsearch6.x集群安装部署