MongoDB 聚合 - $lookup 性能

Posted 2023-03-11

技术标签:

【中文标题】MongoDB 聚合 - $lookup 性能【英文标题】：MongoDB Aggregation - $lookup performance 【发布时间】：2019-03-23 23:55:44 【问题描述】：

我使用 MongoDB 3.6 聚合和查找来加入两个集合（用户和订阅用户）。

var UserSchema = mongoose.Schema(
  email:
    type: String,
    trim: true,
    unique: true,
  ,
  name: 
    type: String,
    required: true,
    trim: true,
  ,
  password: String,
  gender:  type: String, enum: ['male', 'female', 'unknown'], default: 'unknown',
  age_range:  type: String, enum: [12, 16, 18], default: 18,
  country: type:String, default:'co'
);

var SuscriptionUsersSchema = mongoose.Schema(
  user_id: 
    ref: 'Users',
    type: mongoose.Schema.ObjectId
  ,
  channel_id: 
    ref: 'Channels',
    type: mongoose.Schema.ObjectId
  ,
  subscribed: type: Boolean, default:false,
  unsubscribed_at: Date,
  subscribed_at: Date
);

我的目标是查询 suscriptionusers 并加入 users 集合，匹配开始和结束日期，以获得订阅的一些分析，例如订阅用户的国家、年龄范围和性别，并在折线图中显示数据. 我是这样做的：

db.getCollection('suscriptionusers').aggregate([
$match: 
    'channel_id': ObjectId('......'),
    'subscribed_at': 
            $gte: new Date('2018-01-01'),
            $lte: new Date('2019-01-01'),
    ,
    'subscribed': true
,     

    $lookup:
        from: "users",      
        localField: "user_id", 
        foreignField: "_id",
        as: "users"        
    
,
/*  Implementing this form instead the earlier (above), make the process even slower :(
 $lookup:
 
   from: "users",
   let:  user_id: "$user_id" ,
   pipeline: [
       $match:
           $expr:
             $eq: [ "$_id",  "$$user_id" ]
          
      ,
       $project:  age_range:1, country: 1, gender:1  
   ],
   as: "users"
 
,*/
$unwind: 
    path: "$users",
    preserveNullAndEmptyArrays: false
,
$project: 
    'users.age_range': 1, 
    'users.country': 1, 
    'users.gender': 1, 
    '_id': 1, 
    'subscribed_at':  $dateToString:  format: "%Y-%m", date: "$subscribed_at"  ,
    'unsubscribed_at':  $dateToString:  format: "%Y-%m", date: "$unsubscribed_at"  
,
])

主要关注的是性能。例如，对于大约 150.000 个订阅者，查询大约需要 7~8 秒来检索信息，我担心百万订阅者会发生什么，因为即使我设置了记录限制（例如只检索数据两个月之间），在此期间可能有数百个订阅者。

我已经尝试为subscriptionusers 集合创建索引，为user_id 字段创建索引，但是没有任何改进。

db.getCollection('suscriptionusers').ensureIndex(user_id: 1);

我的问题是，我是否应该将字段（国家、年龄范围和性别）也保存在订阅用户集合中？因为如果我在不查找用户集合的情况下进行查询，则该过程已经足够快了。

或者有没有更好的方法来使用我当前的方案来提高性能？

非常感谢:)

编辑：只是考虑到，用户可以订阅多个频道，正因为如此，订阅不会保存在用户集合中

【问题讨论】：

是否在 subscribed_at 字段上创建了索引？并且还使用较新的$lookup 语法来$project 管道内的字段感谢您的帮助@AnthonyWinzlet，我已经实施了您的建议，（请参阅更新），但是时间响应几乎相同。我已经在 subscribed_at、subscribed 和 channel_id 上创建了索引，甚至我还做了一个 reIndex()，但还是一样。还有什么建议吗？ :) 【参考方案1】：

好吧，也许不是最好的方法，但我只是将 UserSchema 所需的字段包含到了 SuscriptionUsersSchema 中。对于分析目的，这明显更快。另外，我发现分析记录必须在时间上保持不变，以保持数据当时生成的状态。因此，通过这种方式使用数据，即使用户更改了她/他的信息，或者删除了帐户，数据也将保持不变。如果您有任何建议，请随时分享:)

仅供参考，我的 SuscriptionUsersSchema 现在看起来像：

    var SuscriptionUsersSchema = mongoose.Schema(
  user_id: 
    ref: 'Users',
    type: mongoose.Schema.ObjectId
  ,
  channel_id: 
    ref: 'Channels',
    type: mongoose.Schema.ObjectId
  ,
  subscribed: type: Boolean, default:false,
  gender:  type: String, enum: ['male', 'female', 'unknown'], default: 'unknown',
  age_range:  type: String, enum: [12, 16, 18], default: 18,
  country: type:String, default:'co'
  unsubscribed_at: Date,
  subscribed_at: Date
);

【讨论】：

以上是关于MongoDB 聚合 - $lookup 性能的主要内容，如果未能解决你的问题，请参考以下文章