弹性查询过滤器挑战

Posted 2023-04-17

技术标签:

【中文标题】弹性查询过滤器挑战【英文标题】：Elastic Query Filters Challenge 【发布时间】：2017-03-10 21:13:53 【问题描述】：

我有以下查询，为给定的供应商 ID 生成前 100 名卖家，针对在 product_skus 索引中查找给定供应商的产品 sku 的销售索引运行。这很好用。

query = 
  size: 0,  
  query: 
    bool: 
      filter: [
        
          constant_score: 
            filter: 
              terms: 
                sku: 
                  index: "product_skus",
                  type: "product",
                  id: supplier_id,
                  path: "skus"
                
              
            
          
        
      ],
      must_not: []
    
  , 
  aggs: 
    unit_sum: 
      terms: 
        field: "sku",
        size: 100,
        order: 
          one: "desc"
        
      ,
      aggs: 
        one: 
          sum: 
            field: "units"

现在我有一个场景，给定用户需要将其访问权限限制为供应商 sku 的子集。我正在努力寻找解决这个问题的最佳方法。我倾向于拥有另一个用户可以访问的 Skus 索引并进行第二次查找，但我无法完全理解查询逻辑。

简单来说，例如；如果在上述查询中，对于供应商 1，我们返回产品 [A,B,C,D,E] 用户 John 只能看到基于产品 [A,C,E] 的结果我将如何编写查询来执行此操作？是否像在 bool 内的过滤器后添加一个 should 子句一样简单？

提前致谢！

【问题讨论】：

这真的取决于给定用户可以访问的 SKU 的数量......如果它是一个小数字，我认为你可以使用额外的 SHOULD 子句来逃避。如果是数百个，那么您可能需要其他解决方案。是的，即使不是数千，也有数百个。你考虑过使用路由吗？ elastic.co/blog/customizing-your-document-routing 似乎您应该将 product-family 添加到每个 sku 文档中，然后将用户查询限制在此范围内。或者为product-familys 设置过滤器？ 【参考方案1】：

在这种情况下，您可能需要路由，因为您的场景允许您为用户使用路由。作为将数据组织到单独的分片的额外好处，它可以在查询中使用路由时提高性能。为什么？因为在使用路由时，请求将只发送到包含相关数据的分片，而不是集群中的每个节点。

在您的情况下会是什么样子？让我们看看一个简单的映射，以及一个只能使用 id 123 访问的产品：

product_skus的映射（根据需要修改）：

PUT product_skus

  "settings": 
    "index": 
      "number_of_shards": "5",
      "number_of_replicas": "1"
    

  ,
  "mappings": 
    "product": 
      "_routing": 
        "required": true
      ,
      "properties": 
        "supplierId":
          "type": "integer"
        , "path":
          "type": "string"

现在让我们在索引类型中放入一个产品（注意路由）：

POST product_skus/product?routing=123

  "supplierId": 123,
  "path": "some/path"

最后是两个请求及其使用路由的输出：

GET product_skus/_search?routing=123

  "query": 
    "match_all":

输出：


  "took": 4,
  "timed_out": false,
  "_shards": 
    "total": 1,
    "successful": 1,
    "failed": 0
  ,
  "hits": 
    "total": 1,
    "max_score": 1,
    "hits": [
      
        "_index": "product_skus",
        "_type": "product",
        "_id": "AVrMHzgx28yun46LEMYm",
        "_score": 1,
        "_routing": "123",
        "_source": 
          "supplierId": 123,
          "path": "some/path"
        
      
    ]

第二次查询：

GET product_skus/_search?routing=124

  "query": 
    "match_all":

输出：


  "took": 1,
  "timed_out": false,
  "_shards": 
    "total": 1,
    "successful": 1,
    "failed": 0
  ,
  "hits": 
    "total": 0,
    "max_score": null,
    "hits": []

这只是一个简单的示例，您可能需要查看文档以获取更多信息：

The _routing field Another routing example An example of routing with fields of the type

此外，以下显示只有一个分片用于路由：

GET product_skus/_search_shards?routing=123

输出：


  "nodes": 
    "1sMKtN6aQ9yyOsTjknWyQA": 
      "name": "1sMKtN6",
      "ephemeral_id": "X-V2QGTwTmqUFQb1B6KIUw",
      "transport_address": "127.0.0.1:9300",
      "attributes": 
    
  ,
  "shards": [
    [
      
        "state": "STARTED",
        "primary": true,
        "node": "1sMKtN6aQ9yyOsTjknWyQA",
        "relocating_node": null,
        "shard": 0,
        "index": "product_skus",
        "allocation_id": 
          "id": "1MMkFaALRxm1N-x8J8AGhg"
        
      
    ]
  ]

有关详细信息，请参阅search shards API。

【讨论】：

以上是关于弹性查询过滤器挑战的主要内容，如果未能解决你的问题，请参考以下文章