如何在 Elasticsearch 中返回每个发布者的唯一用户数?

Posted

技术标签:

【中文标题】如何在 Elasticsearch 中返回每个发布者的唯一用户数?【英文标题】:How to return number of unique user per publisher in Elasticsearch? 【发布时间】:2015-04-27 13:26:58 【问题描述】:

我们需要对 ES 进行 RESTful 调用,这将为我们返回给定搜索词的每个发布者的唯一用户数。

问题是:返回使用以下任何搜索词查询系统的所有唯一用户的列表:

租车 汽车租赁 租一辆汽车

mysql 中,查询看起来像这样:

SELECT userId, publisherId FROM keywordPixel WHERE (keywordId LIKE '/^(?=.*\brent\b)(?=.*\ba\b)(?=.*\bcar\b).*$/' OR keywordId LIKE '/^(?=.*\bcar\b)(?=.*\brental\b).*$/' OR keywordId LIKE '/^(?=.*\brent\b)(?=.*\ban\b)(?=.*\bauto\b).*$/' ) AND date >= [start] AND date <= [end]

此查询搜索在其搜索查询中至少有一行与搜索词中的单词匹配的用户。例如,如果用户 A 搜索“Berlin car to rent”,这将匹配搜索词“car Rental”,并且该用户应该在我们的结果中

SELECT COUNT(DISTINCT(userId)), publisherId FROM keywordPixel WHERE (keywordId LIKE '/^(?=.*\brent\b)(?=.*\ba\b)(?=.*\bcar\b).*$/' OR keywordId LIKE '/^(?=.*\bcar\b)(?=.*\brental\b).*$/' OR keywordId LIKE '/^(?=.*\brent\b)(?=.*\ban\b)(?=.*\bauto\b).*$/') AND date >= [start] AND date <= [end]  GROUP BY publisherId

第二个查询应该是同一查询的每个发布者的不同用户的 COUNT。

我的解决方案是这样的:

curl -XPOST 'localhost:9200/keyword*/_search?pretty' -d '

  "query": 
    "filtered": 
      "query": 
        "bool": 
          "should": [
             "match_phrase":  "keywordId.keywordId_analyzed": "Honda "  ,
             "match_phrase":  "keywordId.keywordId_analyzed": "car rental"  ,
             "match_phrase":  "keywordId.keywordId_analyzed": "rent an auto"  
          ]
        
      ,
      "filter": 
        "range": 
          "@timestamp": 
            "from": "2015-04-01T12:20:15+00:00",
            "to": "2015-04-25T12:20:15+00:00"
          
        
      
    
  ,
 "aggs": 
    "unique_users" : 
      "terms" :  "field" : "publisherId" 
    
  ,
  "_source": ["userId", "publisherId", "keywordId"]
'

但它不会返回每个发布者的唯一用户数。有人可以帮忙吗?

【问题讨论】:

【参考方案1】:

Cardinality aggregation 就是你要找的东西

curl -XPOST 'localhost:9200/keyword*/_search?pretty' -d '

  "query": 
    "filtered": 
      "query": 
        "bool": 
          "should": [
             "match_phrase":  "keywordId.keywordId_analyzed": "Honda "  ,
             "match_phrase":  "keywordId.keywordId_analyzed": "car rental"  ,
             "match_phrase":  "keywordId.keywordId_analyzed": "rent an auto"  
          ]
        
      ,
      "filter": 
        "range": 
          "@timestamp": 
            "from": "2015-04-01T12:20:15+00:00",
            "to": "2015-04-25T12:20:15+00:00"
          
        
      
    
  ,
 "aggs": 
    "unique_users" : 
      "cardinality" :  "field" : "publisherId" 
    
  ,
  "_source": ["userId", "publisherId", "keywordId"]
'

【讨论】:

【参考方案2】:

基数当然是我正在寻找的,但不是那样的,因为它会返回我的唯一发布者数量,我不希望这样,我想要每个发布者的唯一用户数量,所以这是解决方案:

curl -XPOST 'localhost:9200/keyword*/_search?pretty' -d '

  "query": 
    "filtered": 
      "query": 
        "bool": 
          "should": [
             "match_phrase":  "keywordId.keywordId_analyzed": "rent a car"  ,
             "match_phrase":  "keywordId.keywordId_analyzed": "car rental"  ,
             "match_phrase":  "keywordId.keywordId_analyzed": "rent an auto"  
          ]
        
      ,
      "filter": 
        "range": 
          "@timestamp": 
            "from": "2015-04-01T12:20:15+00:00",
            "to": "2015-04-25T12:20:15+00:00"
          
        
      
    
  ,
  "aggs": 
    "users_per_publisher": 
      "terms":  "field" : "publisherId" ,
      "aggs": 
        "number_of_unique_users": 
          "cardinality": "field" : "userId"
        
      
    
  ,
  "_source": ["userId", "publisherId", "keywordId"]
'

【讨论】:

以上是关于如何在 Elasticsearch 中返回每个发布者的唯一用户数?的主要内容,如果未能解决你的问题,请参考以下文章

如何在 Elasticsearch 中查找包含给定点的多边形

返回地理点数组的 Elasticsearch 距离

Elasticsearch聚合的嵌套桶如何排序

elasticsearch - 返回字段的标记

ElasticSearch——分页查询

如何在 Elasticsearch 中编写“或”查询?