ElasticSearch6.x版本聚合统计在Kibana上的实操和在SpringBoot上的实操

Posted 这里是杨杨吖

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ElasticSearch6.x版本聚合统计在Kibana上的实操和在SpringBoot上的实操相关的知识,希望对你有一定的参考价值。

文章目录

一、教学讲解视频

教学讲解视频地址:视频地址

二、Kibana上操作

1.直方图聚合统计

①概念

直方图聚合是一种基于多桶值聚合,可从文档中提取的数值或数值范围值来进行聚合。它可以对参与聚合的值来动态的生成固定大小的桶。

②代码

“size”: 0 作用是: 不显示具体的内容

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_histogram": 
      "histogram": 
        "field": "area",
        "interval": 5
      
    
  

2.强制直方图聚合统计范围

①概念

使用extended_bounds可以进行强制指定直方图的统计起止范围。

②代码

注意:extended_bounds不会过滤分组,即使分组的值超过了extended_bounds的范围区间,也依然会显示。

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_histogram": 
      "histogram": 
        "field": "area",
        "interval": 5,
        "extended_bounds":
          "min": 0,
          "max": 10
        
      
    
  

3.直方图聚合统计排序

①概念

在直方图聚合统计之后,我们可以对统计结果进行排序,排序分两种,一个根据统计后的key值进行排序。另一种就是根据统计后的count值进行排序。

②代码

根据统计后的key值降序排序

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_histogram": 
      "histogram": 
        "field": "area",
        "interval": 5,
        "order": 
          "_key": "desc"
        
      
    
  

根据统计后的count值降序排序

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_histogram": 
      "histogram": 
        "field": "area",
        "interval": 5,
        "order": 
          "_count": "desc"
        
      
    
  

4.直方图聚合统计偏移

①概念

分组默认从0开始以interval为间隔步进,可以通过offset修改分组的开始位置,这样就可以从offset开始以interval为间隔步进

②代码

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_histogram": 
      "histogram": 
        "field": "area",
        "interval": 5,
        "offset": 3
      
    
  

5.日期直方图聚合统计

①概念

interval(时间间隔)的可用表达式:

  • year 1y
  • quarter 1q 季度
  • month 1M 月份
  • week 1w 星期
  • day 1d
  • hour 1h 小时
  • minute 1m 分钟
  • second 1s

②代码

把符合记录的最早时间和最晚时间为范围,根据每月1号(1M)为区间对范围进行切分

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_date_histogram": 
      "date_histogram": 
        "field": "updateTime",
        "interval": "1M",
        "format": "yyyy-MM-dd HH:mm:ss"
      
    
  

6.日期范围聚合统计

①概念

通过range查询某个日期范围内的数据个数统计。

②代码

/M 精确到月份
now 当前时间
time_zone 设置时区,默认是UTC,也就是现在时间会相差8h
假设现在(now)是2023年1月6日,这里就是查询日期范围为[2022-11-01, 2022-12-06]之间的数据个数。

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_date_range": 
      "date_range": 
        "field": "updateTime",
        "format": "yyyy-MM-dd", 
        "time_zone": "+08:00",
        "ranges": [
          
            "from": "now-2M/M",
            "to": "now-1M"
          
        ]
      
    
  

7.过滤、多重过滤直方图聚合统计

①概念

使用filter在聚合统计前过滤出符合条件的数据。

②代码

先过滤出money字段值大于2500的数据,然后对过滤出的数据进行interval为10的区间范围大小进行聚合统计。

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_filter": 
      "aggs": 
        "test_histogram": 
          "histogram": 
            "field": "area",
            "interval": 10
          
        
      ,
      "filter": 
        "range": 
          "money": 
            "gte": 2500
          
        
      
    
  

多重过滤的话,我们可以使用filters实现。
注意:多重过滤是分别过滤然后分别聚合统计。

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_filter": 
      "aggs": 
        "test_histogram": 
          "histogram": 
            "field": "area",
            "interval": 10
          
        
      ,
      "filters": 
        "filters": 
          "test_range1": 
            "range": 
              "money": 
                "gte": 2800
              
            
          ,
          "test_range2" :
            "range": 
              "id": 
                "lte": 5
              
            
          
        
      
    
  

8.空值聚合统计

①概念

使用missing可以把指定字段为null的数据统计出来。

②代码

GET /online_house_achieve/house/_search

  "size": 0, 
  "aggs": 
    "test_missing": 
      "missing": 
        "field": "user.password"
      
    
  

9.平均、最大、最小和求和的聚合统计

①概念

使用avg max min sum可以对某字段分别进行求平均、求最大、求最小和求和的操作。

②代码

平均

GET /online_house_achieve/house/_search

  "size": 0, 
   "aggs" : 
      "avg_grade" :  "avg" :  "field" : "area"  
   

最大

GET /online_house_achieve/house/_search

  "size": 0, 
   "aggs" : 
      "max_grade" :  "max" :  "field" : "area"  
   

最小

GET /online_house_achieve/house/_search

  "size": 0, 
   "aggs" : 
      "min_grade" :  "min" :  "field" : "area"  
   

求和

GET /online_house_achieve/house/_search

  "size": 0, 
   "aggs" : 
      "sum_grade" :  "sum" :  "field" : "area"  
   

一次性查询最大、最小、平均、求和
写法一

GET /online_house_achieve/house/_search

  "size": 0,
  "aggs":
    "max_area":
      "max":
        "field":"area"
      
    ,
    "min_area":
      "min":
        "field":"area"
      
    ,
    "avg_area":
      "avg":
        "field":"area"
      
    ,
    "sum_area":
      "sum":
        "field":"area"
      
    
  

写法二

GET /online_house_achieve/house/_search

  "size":0,
  "aggs":
    "area_stats":
      "stats":
        "field":"area"
      
    
  

10.去重聚合统计

①概念

area_count 取名,随便定义即可。
使用cardinality即可实现去重。

②代码

GET /online_house_achieve/house/_search

  "size":0,
  "aggs":
    "area_count":
      "cardinality": 
        "field": "area"
      
    
  

11.分组聚合统计

①概念

根据category字段分组,然后求每个分组的area字段的平均值

②代码

GET /online_house_achieve/house/_search

  "size": 0,
  "aggs":
    "group_search":
      "terms":
        "field":"category"
      ,
      "aggs":
        "avg_area":
          "avg":
            "field":"area"
          
        
      
    
  

12.分组聚合统计之取Top N

①概念

使用top_hits可以对分组后的结果,取前N条数据。

②代码

GET /online_house_achieve/house/_search

  "size": 0,
  "aggs":
    "group_search":
      "terms":
        "field":"category"
      ,
      "aggs":
        "top_area_hits":
          "top_hits":
            "sort":[
              
                "area":
                  "order":"desc"
                
              
            ],
            "_source":
              "includes":["area","id","category"]
            ,
            "size": 3
          
        
      
    
  

13.百分位聚合统计

①概念

百分位数(percentiles):统计学术语,如果将一组数据从小到大排序,并计算相应的累计百分位,则某一百分位所对应数据的值就称为这一百分位的百分位数。可表示为:一组n个观测值按数值大小排列。如,处于p%位置的值称第p百分位数。

举个栗子~

今年是单位体检的日子,所有人到体检中心来测身高。
单位一共100个人,所有人的身高都在 163cm-190cm 之间。同事 A 的身高值是180cm处于百分位70%。也就是说第70百分位数为180cm。
那么我们就可以理解,同事 A 的身高比公司中70%的人都高,比其他30%的人要矮。

②代码

这里默认展示1%、5%、25%、50%、75%、95%、99%对应的具体数值

GET /online_house_achieve/house/_search

  "size":0,
  "aggs":
    "area_percentiles":
      "percentiles":
        "field": "area",
        "keyed": false
      
    
  

当然,如果不想用系统默认的百分位,也可以自己指定~
指定值去查百分位占比

GET /online_house_achieve/house/_search

  "size": 0,
  "aggs":
    "area_percentiles":
      "percentile_ranks":
        "field": "area",
        "values": [22, 50, 73],
        "keyed": false
      
    
  

指定百分位占比去查对应值
这里也相当于是在查中位数

GET /online_house_achieve/house/_search

  "size":0,
  "aggs":
    "area_percentiles":
      "percentiles":
        "field":"area",
        "percents":[50],
        "keyed":false
      
    
  

三、SpringBoot上操作

以下在SpringBoot操作的代码都是根据上面的Kibana操作的代码转换而来

1.直方图聚合统计

 SearchRequest searchRequest = new SearchRequest();
 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
 HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
          .histogram("test_histogram")
          .field("area")
          .interval(5);
 searchSourceBuilder.aggregation(aggregationBuilder);
 searchSourceBuilder.size(0);
 searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
 SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
 Aggregations aggregations = search.getAggregations();
 ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
 parsedHistogram.getBuckets().forEach(bucket -> 
     logger.info("key: ", bucket.getKey()); //阶梯值
     logger.info("count: ", bucket.getDocCount()); //获取总数
 );

2.强制直方图聚合统计范围

 SearchRequest searchRequest = new SearchRequest();
 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
 HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
          .histogram("test_histogram")
          .field("area")
          .interval(5)
          .extendedBounds(0, 10);
 searchSourceBuilder.aggregation(aggregationBuilder);
 searchSourceBuilder.size(0);
 searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
 SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
 Aggregations aggregations = search.getAggregations();
 ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
 parsedHistogram.getBuckets().forEach(bucket -> 
     logger.info("key: ", bucket.getKey()); //阶梯值
     logger.info("count: ", bucket.getDocCount()); //获取总数
 );

3.直方图聚合统计排序

根据统计后的key值降序排序

SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
        .histogram("test_histogram")
        .field("area")
        .interval(5)
        .order(BucketOrder.aggregation("_key", false));
searchSourceBuilder.aggregation(aggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
parsedHistogram.getBuckets().forEach(bucket -> 
    logger.info("key: ", bucket.getKey()); //阶梯值
    logger.info("count: ", bucket.getDocCount()); //获取总数
);

根据统计后的count值降序排序

SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
        .histogram("test_histogram")
        .field("area")
        .interval(5)
        .order(BucketOrder.aggregation("_count", false));
searchSourceBuilder.aggregation(aggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
parsedHistogram.getBuckets().forEach(bucket -> 
    logger.info("key: ", bucket.getKey()); //阶梯值
    logger.info("count: ", bucket.getDocCount()); //获取总数
);

4.直方图聚合统计偏移

SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HistogramAggregationBuilder aggregationBuilder = AggregationBuilders
        .histogram("test_histogram")
        .field("area")
        .interval(5)
        .offset(3);
searchSourceBuilder.aggregation(aggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedHistogram parsedHistogram = aggregations.get("test_histogram");
parsedHistogram.getBuckets().forEach(bucket -> 
    logger.info("key: ", bucket.getKey()); //阶梯值
    logger.info("count: ", bucket.getDocCount()); //获取总数
);

5.日期直方图聚合统计

SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
DateHistogramAggregationBuilder dateHistogramAggregationBuilder = AggregationBuilders
        .dateHistogram("test_date_histogram")
        .field("updateTime")
        .dateHistogramInterval(DateHistogramInterval.MONTH)
        .format("yyyy-MM-dd HH:mm:ss");
searchSourceBuilder.aggregation(dateHistogramAggregationBuilder);
searchSourceBuilder.size(0);
searchRequest.indices("online_house_achieve").types("house").source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = search.getAggregations();
ParsedDateHistogram parsedDateHistogram = aggregations.get("test_date_histogram");
parsedDateHistogram.getBuckets().forEach(bucket -> 
    logger.info("keyString: ", bucket.getKeyAsString()); //字符串阶梯值
    logger.info("key: ", bucket.getKey(elasticsearch6.x集群安装部署

《ElasticSearch6.x实战教程》之简单搜索Java客户端(上)

ElasticStack的入门学习

《ElasticSearch6.x实战教程》之分词

Elasticsearch6.x之Kibana插件安装

Ubuntu 18.04 LTS 安装 Elasticsearch6.x