Mongodb聚合——排序让查询变得很慢

Posted

技术标签:

【中文标题】Mongodb聚合——排序让查询变得很慢【英文标题】:Mongodb aggregation - sort makes the query very slow 【发布时间】:2016-06-01 21:35:59 【问题描述】:

Mongodb 3.2,安装在 centos 6 上,具有大量 RAM 和磁盘。我有一个包含 10K 文档的集合,其结构如下:


  "id":5752034,
  "score":7.6,
  "name":"ASUS X551 15.6-inch Laptop", 
  "categoryId":"803",
  "positiveAspects":[
                       "id":30030525,
                       "name":"price",
                       "score":9.8,
                       "frequency":139,
                       "rank":100098
                     ,
                     
                       "id":30028399,
                       "name":"use",
                       "score":9.9,
                       "frequency":99,
                       "rank":100099
                     
                     .
                     .
                ]

对于每个文档,嵌套数组 positiveAspects 有几百个元素。

收藏品有以下索引:

 "v" : 1, "key" :  "_id" : 1 , "name" : "_id_", "ns" : "proddb.product_trees" 
 "v" : 1, "key" :  "positiveAspects.id" : 1.0, "positiveAspects.score" : 1.0 , "name" : "positiveAspects.id_1_positiveAspects.score_1", "ns" : "proddb.product_trees" 
 "v" : 1, "key" :  "categoryId" : 1.0, "score" : 1.0 , "name" : "categoryId_1_score_1", "ns" : "proddb.product_trees" 
 "v" : 1, "key" :  "rank" : -1.0 , "name" : "rank_-1", "ns" : "proddb.product_trees" 
 "v" : 1, "key" :  "positiveAspects.rank" : -1.0 , "name" : "positiveAspects.rank_-1", "ns" : "proddb.product_trees" 

我想运行以下聚合,大约需要 40 秒:

  
  aggregate:"product_trees",
  pipeline:[  
    
     $match:  
        categoryId:"803",
        score:  
           $gte:8.0
        
     
  ,
    
     $unwind:"$positiveAspects"
  ,
    
     $match:  
        positiveAspects.id:30030525,
        positiveAspects.score:  
           $gte:9.0
        
     
  ,
    
     $sort:  
        positiveAspects.rank:-1
     
  ,
    
     $project:  
        _id:0,
        score:1,
        id:1,
        name:1,
        positiveAspects:1
     
  ,
    
     $limit:10
  
 ]

以下解释:

2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Beginning planning...
=============================
Options = NO_BLOCKING_SORT INDEX_INTERSECTION
Canonical query:
ns=proddb.product_treesTree: $and
    categoryId == "803"
    score $gte 8.0
Sort: 
Proj: 
=============================
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 0 is kp:  _id: 1  unique name: '_id_' io:  v: 1, key:  _id: 1 , name: "_id_", ns: "proddb.product_trees" 
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 1 is kp:  positiveAspects.id: 1.0, positiveAspects.score: 1.0  multikey name: 'positiveAspects.id_1_positiveAspects.score_1' io:  v: 1, key:  positiveAspects.id: 1.0, positiveAspects.score: 1.0 , name: "positiveAspects.id_1_positiveAspects.score_1", ns: "proddb.product_trees" 
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 2 is kp:  categoryId: 1.0, score: 1.0  name: 'categoryId_1_score_1' io:  v: 1, key:  categoryId: 1.0, score: 1.0 , name: "categoryId_1_score_1", ns: "proddb.product_trees" 
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 3 is kp:  rank: -1.0  name: 'rank_-1' io:  v: 1, key:  rank: -1.0 , name: "rank_-1", ns: "proddb.product_trees" 
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 4 is kp:  positiveAspects.rank: -1.0  multikey name: 'positiveAspects.rank_-1' io:  v: 1, key:  positiveAspects.rank: -1.0 , name: "positiveAspects.rank_-1", ns: "proddb.product_trees" 
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Predicate over field 'score'
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Predicate over field 'categoryId'
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Relevant index 0 is kp:  categoryId: 1.0, score: 1.0  name: 'categoryId_1_score_1' io:  v: 1, key:  categoryId: 1.0, score: 1.0 , name: "categoryId_1_score_1", ns: "proddb.product_trees" 
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Rated tree:
$and
    categoryId == "803"  || First: 0 notFirst: full path: categoryId
    score $gte 8.0  || First: notFirst: 0 full path: score
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Tagging memoID 1
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Enumerator: memo just before moving:
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] About to build solntree from tagged tree:
$and
    categoryId == "803"  || Selected Index #0 pos 0
    score $gte 8.0  || Selected Index #0 pos 1
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Planner: adding solution:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [ categoryId: 1 ,  categoryId: 1, score: 1 ,  score: 1 , ]
---Child:
------IXSCAN
---------keyPattern =  categoryId: 1.0, score: 1.0 
---------direction = 1
---------bounds = field #0['categoryId']: ["803", "803"], field #1['score']: [8.0, inf.0]
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [ categoryId: 1 ,  categoryId: 1, score: 1 ,  score: 1 , ]
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Planner: outputted 1 indexed solutions.
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Only one plan is available; it will be run but will not be cached. query:  categoryId: "803", score:  $gte: 8.0   sort:  projection: , planSummary: IXSCAN  categoryId: 1.0, score: 1.0 
2016-06-01T16:11:27.170-0500 I COMMAND  [conn47] command proddb.product_trees command: aggregate  aggregate: "product_trees", pipeline: [  $match:  categoryId: "803", score:  $gte: 8.0   ,  $unwind: "$positiveAspects" ,  $match:  positiveAspects.id: 30030525, positiveAspects.score:  $gte: 9.0   ,  $sort:  positiveAspects.rank: -1  ,  $project:  _id: 0, score: 1, id: 1, name: 1, positiveAspects: 1  ,  $limit: 10  ], cursor:   keyUpdates:0 writeConflicts:0 numYields:226 reslen:7459 locks: Global:  acquireCount:  r: 906  , Database:  acquireCount:  r: 453  , Collection:  acquireCount:  r: 453    protocol:op_query 38030ms

取出$sort,查询运行2秒。

你能解释一下为什么$sort 会造成这样的性能下降,考虑到它有可以使用的索引吗?有没有我遗漏的索引 可以做些什么来修复?

谢谢!

Mongodb 聚合 - 排序使查询非常慢

【问题讨论】:

尝试重新排序$project,以便$sort 后面紧跟$limit,因为您可以使用优化。 docs.mongodb.com/manual/core/aggregation-pipeline-optimization/… 我试过了,但不影响响应时间。 【参考方案1】:

这是因为$sort在聚合框架早期没有使用时没有使用索引。要利用索引,必须使用 $sort 或 $match 作为第一阶段。

请看Pipeline Operators and Indexes

【讨论】:

我明白了,但是很多实际案例无法在管道开始时对结果进行排序。那么这样的管道注定要慢吗?有没有让它更快的技术 不幸的是,您无能为力。但是,如果您有 MongoDB 3.2+,请尝试使用 $filter 而不是 $unwind。这将减少您的数据集并加快查询速度。 另一个问题是 $sort 只使用一个核心/处理器@Seffy【参考方案2】:

发生这种情况是因为 $unwind 阶段。您的查询性能很慢,因为查询在 $unwind 阶段之后没有考虑索引。用解释检查一下。你会知道的。发生这种情况是因为在 $unwind 之后整个文档都会改变并且它变得不同,存储在 RAM 中用于索引目的。

【讨论】:

以上是关于Mongodb聚合——排序让查询变得很慢的主要内容,如果未能解决你的问题,请参考以下文章

mongodb Aggregation聚合操作之$sort

mongodb高级聚合查询

mongodb高级聚合查询

Spring Mongo 聚合查询以从 MongoDB 获取不同的国家名称和国家代码

Mongodb的 mongoTemplate 内嵌文档怎么进行查询?

mongo中的高级查询之聚合操作(distinct,count,group)与数据去重