对 Elasticsearch 字段进行汇总和计数
Posted
技术标签:
【中文标题】对 Elasticsearch 字段进行汇总和计数【英文标题】:Sum and count aggregations over Elasticsearch fields 【发布时间】:2018-06-26 19:32:42 【问题描述】:我是 Elasticsearch 的新手,我希望对 Elasticsearch 5.x 索引中的字段执行某些聚合。我有一个索引,其中包含具有字段langs
(具有嵌套结构)和docLang
的文档。这些是动态映射的字段。以下是示例文档
文档 1:
"_index":"A",
"_type":"document",
"_id":"1",
"_source":
"text":"This is a test sentence.",
"langs":
"X":
"en":1,
"es":2,
"zh":3
,
"Y":
"en":4,
"es":5,
"zh":6
,
"docLang": "en"
文档 2:
"_index":"A",
"_type":"document",
"_id":"2",
"_source":
"text":"This is a test sentence.",
"langs":
"X":
"en":1,
"es":2
,
"Y":
"en":3,
"es":4
,
"docLang": "es"
文档 3:
"_index":"A",
"_type":"document",
"_id":"2",
"_source":
"text":"This is a test sentence.",
"langs":
"X":
"en":1
,
"Y":
"en":2
,
"docLang": "en"
我想对 langs
字段执行求和聚合,这样对于每个键 (X/Y) 和每种语言,我都可以获得索引中所有文档的总和。另外,我想从docLang
字段中生成每种语言的文档数。
例如:对于以上 3 个文档,langs
字段的总和聚合如下所示:
"langs":
"X":
"en":3,
"es":4,
"zh":3
,
"Y":
"en":9,
"es":9,
"zh":6
docLang
计数如下所示:
"docLang":
"en" : 2,
"es" : 1
另外,由于一些生产环境限制,我无法在 Elasticsearch 中使用脚本。所以,我想知道是否可以对上述字段仅使用 field
聚合类型?
【问题讨论】:
【参考方案1】:
"size": 0,
"aggs":
"X":
"nested":
"path": "langs.X"
,
"aggs":
"X_sum_en":
"sum":
"field": "langs.X.en"
,
"X_sum_es":
"sum":
"field": "langs.X.es"
,
"X_sum_zh":
"sum":
"field": "langs.X.zh"
,
"Y":
"nested":
"path": "langs.Y"
,
"aggs":
"Y_sum_en":
"sum":
"field": "langs.Y.en"
,
"Y_sum_es":
"sum":
"field": "langs.Y.es"
,
"Y_sum_zh":
"sum":
"field": "langs.Y.zh"
,
"sum_docLang":
"terms":
"field": "docLang.keyword",
"size": 10
因为你没有提到,但我认为这很重要。我将X
和Y
设为nested
字段:
"langs":
"properties":
"X":
"type": "nested",
"properties":
"en":
"type": "long"
,
"es":
"type": "long"
,
"zh":
"type": "long"
,
"Y":
"type": "nested",
"properties":
"en":
"type": "long"
,
"es":
"type": "long"
,
"zh":
"type": "long"
但是,如果您的字段根本不是 nested
,而这里我的意思实际上是 Elasticsearch 中的 nested
字段类型,那么像这样的简单聚合就足够了:
"size": 0,
"aggs":
"X_sum_en":
"sum":
"field": "langs.X.en"
,
"X_sum_es":
"sum":
"field": "langs.X.es"
,
"X_sum_zh":
"sum":
"field": "langs.X.zh"
,
"Y_sum_en":
"sum":
"field": "langs.Y.en"
,
"Y_sum_es":
"sum":
"field": "langs.Y.es"
,
"Y_sum_zh":
"sum":
"field": "langs.Y.zh"
,
"sum_docLang":
"terms":
"field": "docLang.keyword",
"size": 10
【讨论】:
以上是关于对 Elasticsearch 字段进行汇总和计数的主要内容,如果未能解决你的问题,请参考以下文章
如何通过Elasticsearch 6.x中的动态或未知字段进行聚合