elasticsearch - 聚合返回 key 中的术语，但不是完整的字段，我怎样才能返回完整的字段？

Posted 2023-03-29

技术标签:

【中文标题】elasticsearch - 聚合返回 key 中的术语，但不是完整的字段，我怎样才能返回完整的字段？【英文标题】：elasticsearch - Aggregation returns terms in key , but not the complete field, how can I get full field returned? 【发布时间】：2014-08-29 16:16:45 【问题描述】：

在elasticsearch的实现中，我在几个字段的基础上做了几个简单的聚合，如下图-

 "aggs" : 
    "author" : 
        "terms" :  "field" : "author" 
          , "size": 20,
          "order" :  "_term" : "asc" 
        
    ,
    "title" : 
        "terms" :  "field" : "title" 
          , "size": 20
        
    ,
    "contentType" : 
        "terms" :  "field" : "docType" 
          , "size": 20

聚合工作正常，我得到相应的结果。但返回的标题键字段（或任何其他字段 - 多词）具有单个词聚合和结果。我需要返回结果中的完整标题，而不仅仅是一个词——这没有多大意义。我怎样才能得到它。

当前结果（只是一个 sn-p）-

"title": 
     "buckets": [
        
           "key": "test",
           "doc_count": 1716
        ,
        
           "key": "pptx",
           "doc_count": 1247
        ,
        
           "key": "and",
           "doc_count": 661
        ,
        
           "key": "for",
           "doc_count": 489
        ,
        
           "key": "mobile",
           "doc_count": 487
        ,
        
           "key": "docx",
           "doc_count": 486
        ,
        
           "key": "pdf",
           "doc_count": 450
        ,
        
           "key": "2012",
           "doc_count": 397
         ]

预期结果 -

"title": 
         "buckets": [
            
               "key": "test document for stack overflow ",
               "doc_count": 1716
            ,
            
               "key": "this is a pptx",
               "doc_count": 1247
            ,
            
               "key": "its another document and so on",
               "doc_count": 661
            ,
            
               "key": "for",
               "doc_count": 489
            ,
            
               "key": "mobile",
               "doc_count": 487
            ,
            
               "key": "docx",
               "doc_count": 486
            ,
            
               "key": "pdf",
               "doc_count": 450
            ,
            
               "key": "2012",
               "doc_count": 397

我浏览了很多文档，它解释了聚合结果的不同方法，但是如果结果中的字段键入，我找不到如何获取全文，请告知我该如何实现？

【问题讨论】：

【参考方案1】：

您需要拥有索引中术语的未标记副本，在您的映射中使用multi-fields：


    "test": 
        "mappings": 
            "book": 
                "properties":                 
                    "author": 
                        "type": "string",
                        "fields": 
                            "untouched": 
                                "type": "string",
                                "index": "not_analyzed"
                            
                        
                    ,
                    "title": 
                        "type": "string",
                        "fields": 
                            "untouched": 
                                "type": "string",
                                "index": "not_analyzed"
                            
                        
                    ,
                    "docType": 
                        "type": "string",
                        "fields": 
                            "untouched": 
                                "type": "string",
                                "index": "not_analyzed"

在您的聚合查询中引用未标记的字段：

"aggs" : 
    "author" : 
         "terms" :  
            "field" : "author.untouched", 
            "size": 20,
            "order" :  "_term" : "asc" 
        
     ,
    "title" : 
        "terms" :  
          "field" : "title.untouched", 
          "size": 20
        
    ,
    "contentType" : 
        "terms" :  
           "field" : "docType.untouched", 
           "size": 20

【讨论】：

非常感谢丹！我要试试这个，看起来很有前途！一个字段怎么样，我需要对搜索查询进行标记化，但同时对聚合进行非标记化？上面的映射做到了这一点，它使用默认分析器（标准分析器）索引字段并索引字段的未标记版本。例如，对于搜索，使用字段名称 title（已标记），对于聚合，使用字段名称 title.untouched（未标记）。 @DanTuffery 谢谢，我又来了。是否可以查询索引字段但让返回原始字段？更具体地说，我在索引字段上使用了一些 asciifolding 过滤，但需要返回原始值。我坚持如何在聚合中的 1 步中执行此操作。我得到了这样的东西："aggregations": "suggestion": "terms": "field": "name", 但不是返回名称，我希望返回 name.untouched @ulkas 你在正确的轨道上。您的查询将与您的聚合分开。在您的查询中使用您的索引字段，并在您的聚合中使用您未触及的字段。 "query": ..., "aggregation": ...【参考方案2】：

似乎不推荐使用上述帖子中指定的 multi_fields http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields

【讨论】：

它们已被弃用，但上面的海报使用它的方式是根据您粘贴的链接的新方式。【参考方案3】：

我遇到了类似的问题。当我运行命令时：

   curl -XGET "localhost:9200/logstash*/_mapping?pretty"

响应中有以下有用的内容：

   "host" : 
     "type" : "string",
       "norms" : 
         "enabled" : false
       ,
       "fields" : 
         "raw" : 
           "type" : "string",
           "index" : "not_analyzed",
           "ignore_above" : 256
         
       
     ,...

我意识到添加 .raw 应该会改变输出并获得所需的输出。

类似：

      "aggs": 
        "computes": 
          "terms": 
            "field": "host.raw",
            "size": 0

为我做了诀窍。

elasticsearch 新手，但我看到许多字符串类型的字段都有一个“原始”字段，可以在查询中使用。

如果一些专家能够阐明我的发现，那就太好了。正确/部分正确/错误？！

【讨论】：

logstash 输出插件在 Elasticsearch 中创建一个默认索引模板，该模板应用于任何名称以 logstash- 开头的索引。使用此模板时，默认情况下会为 string 类型的每个属性创建一个 raw 字段。这是模板：github.com/logstash-plugins/logstash-output-elasticsearch/blob/…

以上是关于elasticsearch - 聚合返回 key 中的术语，但不是完整的字段，我怎样才能返回完整的字段？的主要内容，如果未能解决你的问题，请参考以下文章