ElasticSearch post_filter 和过滤聚合的行为方式不同

Posted

技术标签:

【中文标题】ElasticSearch post_filter 和过滤聚合的行为方式不同【英文标题】:ElasticSearch post_filter and filtered aggregations not behaving the same way 【发布时间】:2019-05-22 03:31:20 【问题描述】:

我已经花了整整一周的时间来解决这个问题。我正在关注这个(相当老的)article on e-commerce search and faceted filtering 等,到目前为止它运行良好(搜索结果很好,当在查询中应用过滤器时聚合工作很好。我使用的是 ElasticSearch 6.1.1。

但是因为我想让我的用户在构面上执行多项选择,所以我将过滤器移到了 post_filter 部分。这仍然运行良好,它可以正确过滤结果并准确显示整个文档集的聚合计数。

阅读this question on *** 后,我意识到我必须使用“过滤”聚合和“特殊”聚合来执行一些疯狂的杂技,以相互修剪聚合以显示正确的计数并允许多个过滤器在同时。我已经要求对这个问题进行一些澄清,但还没有回应(这是一个老问题)。

我长期以来一直在努力解决的问题是在 嵌套字段 上获得一组过滤聚合,其中所有方面都使用所有过滤器进行过滤。

我的计划是使用常规聚合(未过滤)并保持所选方面聚合不过滤(以便我可以选择多个条目)但使用当前选定的方面过滤所有其他聚合,以便我只能显示过滤器我仍然可以申请。

但是,如果我在文档上使用相同的过滤器(效果很好),并将过滤器放在过滤后的聚合中,它们就不会按预期工作。计数都是错误的。我知道聚合是在过滤器之前计算的,这就是我在我想要的聚合上复制过滤器的原因。

这是我的查询:

  "query": 
    "bool": 
      "must": [
        
          "multi_match": 
            "fields": [
              "search_data.full_text_boosted^7",
              "search_data.full_text^2"
            ],
            "type": "cross_fields",
            "analyzer": "full_text_search_analyzer",
            "query": "some book"
          
        
      ]
    
  

这里没什么特别的,它运行良好并返回相关结果。

这是我的过滤器(在 post_filter 中):

"post_filter" : 
    "bool" : 
      "must" : [
      
        "nested": 
          "path": "string_facets",
            "query": 
              "bool" : 
                "filter" : 
                [
                   "term" :  "string_facets.facet_name" : "Cover colour"  ,
                   "terms" :  "string_facets.facet_value" : [ "Green" ]  
                ]
              
            
          
        

      ]
    
  

让我强调一下:这很好用。我看到了正确的结果(在这种情况下,显示了 '13' 结果,所有结果都匹配正确的字段 - 'Cover colour' = 'Green')。

这是我的一般(未过滤的聚合),它返回所有产品的所有方面以及正确的计数:

    "agg_string_facets": 
  "nested": 
    "path": "string_facets"
  ,
  "aggregations": 
      "facet_name": 
        "terms": 
          "field": "string_facets.facet_name"
        ,
        "aggregations": 
          "facet_value": 
            "terms": 
              "field": "string_facets.facet_value"
            
          
        
      
  

这也很完美!对于与我的查询匹配的所有文档,我看到了具有准确构面计数的所有聚合。

现在,检查一下:我正在为相同的嵌套字段创建一个聚合,但经过过滤,以便我可以获得“存活”我的过滤器的聚合 + 方面:

"agg_all_facets_filtered" : 

           "filter" : 
             "bool" : 
               "must" : [
                
                   "nested": 
                     "path": "string_facets",
                     "query": 
                       "bool" : 
                         "filter" : [
                            "term" :  "string_facets.facet_name" : "Cover colour"  ,
                            "terms" :  "string_facets.facet_value" : [ "Green" ]  
                          ]
                       
                    
                  
              ]
            
        ,
        "aggs" : 
         "agg_all_facets_filtered" : 
           "nested":  "path": "string_facets" ,
           "aggregations": 
            "facet_name": 
              "terms":  "field": "string_facets.facet_name" ,
              "aggregations": 
                    "facet_value": 
                      "terms":  "field": "string_facets.facet_value" 
                    
                  
                
              
         

       

请注意,我在此聚合中使用的过滤器与首先过滤我的结果的过滤器相同(在帖子中)。

但是由于某种原因,返回的聚合都是错误的,即构面计数。例如,在我在这里的搜索中,我得到了 13 个结果,但从 'agg_all_facets_filtered' 返回的聚合只有一个计数:'Cover colour' = 4


  "key": "Cover colour",
  "doc_count": 4,
  "facet_value": 
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        
          "key": "Green",
          "doc_count": 4
        
    ]
  

在检查了为什么 4 之后,我注意到其中 3 个文档包含两个方面“封面颜色”:一次是“绿色”,一次是“其他颜色”......所以看起来我的聚合仅计算具有该构面名称 TWICE 的条目 - 或与其他文档有共同之处。这就是为什么我认为我对聚合的过滤器是错误的。我对匹配/过滤器的 AND 与 OR 进行了大量阅读,我尝试使用“过滤器”、“应该”等。没有什么可以解决这个问题。

很抱歉,这是一个很长的问题,但是:

我如何编写聚合过滤器,以便返回的构面具有正确的计数,因为我的过滤器本身就可以完美运行?

非常感谢。

更新:例如以下请求,这是我的完整查询(请注意 post_filter 中的过滤器以及过滤聚合中的相同过滤器):


  "size" : 0,
  "query": 
    "bool": 
      "must": [
        
          "multi_match": 
            "fields": [
              "search_data.full_text_boosted^7",
              "search_data.full_text^2"
            ],
            "type": "cross_fields",
            "analyzer": "full_text_search_analyzer",
            "query": "bible"
          
        
      ]
    
  ,

  "post_filter" : 

    "bool" : 
      "must" : [
      
        "nested": 
          "path": "string_facets",
            "query": 
              "bool" : 
                "filter" : 
                [
                   "term" :  "string_facets.facet_name" : "Cover colour"  ,
                   "terms" :  "string_facets.facet_value" : [ "Green" ]  
                ]
              
            
          
        

      ]
    

  ,

  "aggregations": 

        "agg_string_facets": 
      "nested": 
        "path": "string_facets"
      ,
      "aggregations": 
          "facet_name": 
            "terms": 
              "field": "string_facets.facet_name"
            ,
            "aggregations": 
              "facet_value": 
                "terms": 
                  "field": "string_facets.facet_value"
                
              
            
          
      
    ,

    "agg_all_facets_filtered" : 

           "filter" : 
             "bool" : 
               "must" : [
                
                   "nested": 
                     "path": "string_facets",
                     "query": 
                       "bool" : 
                         "filter" : [
                            "term" :  "string_facets.facet_name" : "Cover colour"  ,
                            "terms" :  "string_facets.facet_value" : [ "Green" ]  
                          ]
                       
                    
                  
              ]
            
        ,
        "aggs" : 
         "agg_all_facets_filtered" : 
           "nested":  "path": "string_facets" ,
           "aggregations": 
            "facet_name": 
              "terms":  "field": "string_facets.facet_name" ,
              "aggregations": 
                    "facet_value": 
                      "terms":  "field": "string_facets.facet_value" 
                    
                  
                
              
         

       


    

  

返回的结果是正确的(就文档而言),这里是聚合(未过滤,从结果中,对于 'agg_string_facets' - 注意 'Green' 显示 13 个文档 - 这是正确的):


            "key": "Cover colour",
            "doc_count": 483,
            "facet_value": 
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 111,
              "buckets": [
                
                  "key": "Black",
                  "doc_count": 87
                ,
                
                  "key": "Brown",
                  "doc_count": 75
                ,
                
                  "key": "Blue",
                  "doc_count": 45
                ,
                
                  "key": "Burgundy",
                  "doc_count": 43
                ,
                
                  "key": "Pink",
                  "doc_count": 30
                ,
                
                  "key": "Teal",
                  "doc_count": 27
                ,
                
                  "key": "Tan",
                  "doc_count": 20
                ,
                
                  "key": "White",
                  "doc_count": 18
                ,
                
                  "key": "Chocolate",
                  "doc_count": 14
                ,
                
                  "key": "Green",
                  "doc_count": 13
                
              ]
            
          

这里是聚合(用相同的过滤器过滤,同时来自'agg_all_facets_filtered'),只显示4个'Green':


              "key": "Cover colour",
              "doc_count": 4,
              "facet_value": 
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  
                    "key": "Green",
                    "doc_count": 4
                  
                ]
              
            

更新 2:以下是查询返回的一些示例文档:

"hits": 
    "total": 13,
    "max_score": 17.478987,
    "hits": [
      
        "_index": "redacted",
        "_type": "product",
        "_id": "33107",
        "_score": 17.478987,
        "_source": 
          "type": "product",
          "document_id": 33107,
          "search_data": 
            "full_text": "hcsb compact ultrathin bible mint green leathertouch  holman bible staff leather binding 9781433617751 ",
            "full_text_boosted": "HCSB Compact Ultrathin Bible Mint Green Leathertouch Holman Bible Staff "
          ,
          "search_result_data": 
            "name": "HCSB Compact Ultrathin Bible, Mint Green Leathertouch (Leather)",
            "preview_image": "/images/products/medium/0.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33107"
          ,
          "string_facets": [
            
              "facet_name": "Binding",
              "facet_value": "Leather"
            ,
            
              "facet_name": "Bible size",
              "facet_value": "Compact"
            ,
            
              "facet_name": "Bible size",
              "facet_value": "Ultrathin"
            ,
            
              "facet_name": "Bible version",
              "facet_value": "HCSB"
            ,
            
              "facet_name": "Cover colour",
              "facet_value": "Green"
            
          ]
        
      ,
      
        "_index": "redacted",
        "_type": "product",
        "_id": "17240",
        "_score": 17.416323,
        "_source": 
          "type": "product",
          "document_id": 17240,
          "search_data": 
            "full_text": "kjv thinline bible compact  leather binding 9780310439189 ",
            "full_text_boosted": "KJV Thinline Bible Compact "
          ,
          "search_result_data": 
            "name": "KJV Thinline Bible, Compact (Leather)",
            "preview_image": "/images/products/medium/17240.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=17240"
          ,
          "string_facets": [
            
              "facet_name": "Binding",
              "facet_value": "Leather"
            ,
            
              "facet_name": "Bible size",
              "facet_value": "Compact"
            ,
            
              "facet_name": "Bible size",
              "facet_value": "Thinline"
            ,
            
              "facet_name": "Bible version",
              "facet_value": "KJV"
            ,
            
              "facet_name": "Cover colour",
              "facet_value": "Green"
            
          ]
        
      ,
      
        "_index": "redacted",
        "_type": "product",
        "_id": "17243",
        "_score": 17.416323,
        "_source": 
          "type": "product",
          "document_id": 17243,
          "search_data": 
            "full_text": "kjv busy mom's bible  leather binding 9780310439134 ",
            "full_text_boosted": "KJV Busy Mom'S Bible "
          ,
          "search_result_data": 
            "name": "KJV Busy Mom's Bible (Leather)",
            "preview_image": "/images/products/medium/17243.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=17243"
          ,
          "string_facets": [
            
              "facet_name": "Binding",
              "facet_value": "Leather"
            ,
            
              "facet_name": "Bible size",
              "facet_value": "Pocket"
            ,
            
              "facet_name": "Bible size",
              "facet_value": "Thinline"
            ,
            
              "facet_name": "Bible version",
              "facet_value": "KJV"
            ,
            
              "facet_name": "Cover colour",
              "facet_value": "Pink"
            ,
            
              "facet_name": "Cover colour",
              "facet_value": "Green"
            
          ]
        
      ,
      
        "_index": "redacted",
        "_type": "product",
        "_id": "33030",
        "_score": 15.674053,
        "_source": 
          "type": "product",
          "document_id": 33030,
          "search_data": 
            "full_text": "apologetics study bible for students grass green leathertou  mcdowell sean; holman bible s leather binding 9781433617720 ",
            "full_text_boosted": "Apologetics Study Bible For Students Grass Green Leathertou Mcdowell Sean; Holman Bible S"
          ,
          "search_result_data": 
            "name": "Apologetics Study Bible For Students, Grass Green Leathertou (Leather)",
            "preview_image": "/images/products/medium/33030.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33030"
          ,
          "string_facets": [
            
              "facet_name": "Binding",
              "facet_value": "Leather"
            ,
            
              "facet_name": "Bible designation",
              "facet_value": "Study Bible"
            ,
            
              "facet_name": "Bible designation",
              "facet_value": "Students"
            ,
            
              "facet_name": "Bible feature",
              "facet_value": "Indexed"
            ,
            
              "facet_name": "Cover colour",
              "facet_value": "Green"
            
          ]
        
      ,
      
        "_index": "redacted",
        "_type": "product",
        "_id": "33497",
        "_score": 15.674053,
        "_source": 
          "type": "product",
          "document_id": 33497,
          "search_data": 
            "full_text": "hcsb life essentials study bible brown / green  getz gene a.; holman bible st imitation leather 9781586400446 ",
            "full_text_boosted": "HCSB Life Essentials Study Bible Brown  Green Getz Gene A ; Holman Bible St"
          ,
          "search_result_data": 
            "name": "HCSB Life Essentials Study Bible Brown / Green (Imitation Leather)",
            "preview_image": "/images/products/medium/33497.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33497"
          ,
          "string_facets": [
            
              "facet_name": "Binding",
              "facet_value": "Imitation Leather"
            ,
            
              "facet_name": "Bible designation",
              "facet_value": "Study Bible"
            ,
            
              "facet_name": "Bible version",
              "facet_value": "HCSB"
            ,
            
              "facet_name": "Binding",
              "facet_value": "Imitation leather"
            ,
            
              "facet_name": "Cover colour",
              "facet_value": "Brown"
            ,
            
              "facet_name": "Cover colour",
              "facet_value": "Green"
            
          ]
        
      

【问题讨论】:

您能否使用示例文档详细解释当前结果与预期结果。 嗨,我已经用示例更新了这个问题。谢谢! 我希望您添加的是这 13 个文档中的一些文档。 抱歉,虽然您想查看聚合样本。我添加了一些返回的文件。 我使用您添加到问题中的五个文档创建了一个示例数据。我又添加了一份与后置过滤器不匹配的文档。在嵌套聚合中,我得到了正确的结果,即"key":"Cover colour","doc_count":7,"facet_value":"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":["key":"Green","doc_count":5,"key":"Brown","doc_count":1,"key":"Pink","doc_count":1] 【参考方案1】:

谜团解开了!感谢您的输入,原来我使用的版本(6.1.1)有一个错误。我不知道究竟是什么错误,但我已经安装了 ElasticSearch 6.5,重新索引了我的数据并且没有更改查询或映射,一切正常!

现在,我不知道是应该向 ES 提交错误报告,还是直接放弃它,因为它是一个旧版本并且他们已经继续前进了。

【讨论】:

我正在解决同样的问题,我浏览了您提到的堆栈溢出问题以及与之相关的相关媒体和其他文章,您评论了提出这个问题(再次作为文章),并且在 *** 中发表评论。很高兴看到您找到了这个问题的答案并向我展示了一种开始解决我的问题的方法 您是如何收到有关当前选定方面的信息的,我检查了一些电子商务网站的请求,没有请求显示当前选定方面但发送了所有选定方面。我们如何确定哪些方面需要在没有过滤器的情况下持久化 不确定我是否理解您的问题。您在 post_filter 中运行过滤器——它仍然将结果限制在您选择的方面,但也会给您返回所有方面。你需要自己持久化你选择的那些。

以上是关于ElasticSearch post_filter 和过滤聚合的行为方式不同的主要内容,如果未能解决你的问题,请参考以下文章

说说 Elasticsearch filter 和 post_filter 的区别?

说说 Elasticsearch filter 和 post_filter 的区别?

Elasticsearch:过滤搜索结果 - filter 及 post_filter

ElasticSearch post_filter 和过滤聚合的行为方式不同

Elasticsearch:使用 rescore 来为过滤后的搜索结果重新打分

Elasticsearch:使用 rescore 来为过滤后的搜索结果重新打分