ElasticSearch Aggregation Group by order by sub terms field doc count
Posted
技术标签:
【中文标题】ElasticSearch Aggregation Group by order by sub terms field doc count【英文标题】: 【发布时间】:2016-01-25 06:31:22 【问题描述】:我的映射模型:
// 类型日志:错误、信息、警告
"onef-sora":
"mappings":
"Log":
"properties":
"application":
"type": "string",
"index": "not_analyzed"
"typeLog":
"type": "string"
我的查询:
"size": 0,
"aggs":
"application":
"terms":
"field": "application",
"order" : "_count" : "desc",
"size": 5
,
"aggs":
"typelogs":
"terms":
"field": "typeLog",
"order" : "_term" : "asc"
我想获取错误最多的前 5 个应用程序,但术语聚合顺序支持三个键:_count、_term、_key。如何在查询中按 typeLog doc_count 排序。谢谢!!!
我想要的结果:
"took": 3,
"timed_out": false,
"_shards":
"total": 5,
"successful": 5,
"failed": 0
,
"hits":
"total": 10000,
"max_score": 0,
"hits": []
,
"aggregations":
"application":
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 5000,
"buckets": [
"key": "OneF0",
"doc_count": 1000,
"typelogs":
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "error",
"doc_count": 334
,
"key": "info",
"doc_count": 333
,
"key": "warn",
"doc_count": 333
]
,
"key": "OneF1",
"doc_count": 1000,
"typelogs":
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "error",
"doc_count": 333
,
"key": "info",
"doc_count": 334
,
"key": "warn",
"doc_count": 333
]
,
"key": "OneF2",
"doc_count": 1000,
"typelogs":
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "error",
"doc_count": 332
,
"key": "info",
"doc_count": 333
,
"key": "warn",
"doc_count": 334
]
]
【问题讨论】:
为什么不在typelogs
子聚合中简单地使用_count: desc
?
无意义,我试过了。我想获取***应用程序的类型 = 'Error'
不确定我是否理解,但@juliendangers 提供的内容应该有效,他按照建议使用_count: desc
。您可以只删除 term
查询,但它应该可以工作。
他只是添加过滤器查询,我想得到 top doc_count typeLog = error 和 group by typeLog(include Warn,Info)
【参考方案1】:
当您获得错误最多的前 5 个应用程序时,您可以过滤以仅在查询中保留错误日志(您可以使用过滤器)。然后你只需要通过递减计数来排序你的子项聚合
"size": 0,
"query":
"term":
"typeLog": "Error"
,
"aggs":
"application":
"terms":
"field": "application",
"order":
"_count": "desc"
,
"size": 5
,
"aggs":
"typelogs":
"terms":
"field": "typeLog",
"order":
"_count": "desc"
要保留所有类型日志,您可能需要以其他方式执行查询
"size": 0,
"aggs":
"typelogs":
"terms":
"field": "typeLog",
"order":
"_count": "asc"
,
"aggs":
"application":
"terms":
"field": "application",
"order":
"_count": "desc"
,
"size": 5
您将拥有 3 个一级存储桶,按日志类型排名前 5 位的应用程序
【讨论】:
我不想要过滤器,我想要总警告,信息类型 您要求的是日志最多的前 5 个应用程序,但排名靠前的应用程序可能只有信息日志,因此如果不过滤文档,您将无法获得错误最多的应用程序。顺便说一句,请编辑您的问题以添加您想要所有 typeLog ;) 不过滤会得到如下结果gist.github.com/juliendangers/b68cf017dbec275df5d1 我需要如何查询我想要的elasticsearch返回结果,而不是使用过滤器,因为我需要计算另一个日志类型(警告,信息,...) 好的,您可能需要以不同的方式进行操作:gist.github.com/juliendangers/be9c8255a28573e7c2b1 您通过 typeLog 执行术语聚合,然后通过应用程序执行子聚合。这样一来,您将拥有 3 个存储桶,即“错误、信息、警告”和按 typeLog 排列的前 5 个应用程序。以上是关于ElasticSearch Aggregation Group by order by sub terms field doc count的主要内容,如果未能解决你的问题,请参考以下文章
分析Elasticsearch的Aggregation有感(一)
elasticsearch aggregation 过程(未完)
elasticsearch aggregation - 桶的精确计数