聚合查询从嵌套子文档中获取最新元素的计数

Posted

技术标签:

【中文标题】聚合查询从嵌套子文档中获取最新元素的计数【英文标题】:Aggregate Query geting count of most recent element from nest subdocuments 【发布时间】:2021-07-21 05:14:39 【问题描述】:

我有一个包含许多用户的 mongodb 数据库,我跟踪的子文档之一是文件上传及其通过审核过程的状态。每个文件上传最终都会有一个附件状态。我希望能够提取一些指标来获取每个上传文件的当前状态总数。我开始构建一个聚合查询,从每个上传的文件中提取最新的附件子文档状态并计算它们。

数据结构如下:

 "userName": "johnDoe",
 "email": "johnDoe@gmail.com",
 "uploads" : [
                
                        "_id" : ObjectId("adh12451e0012ce9da0"),
                        "fileName" : "TestDoc.txt",
                        "fileType" : "text/plain",
                        "created" : ISODate("2021-01-06T15:26:14.166Z"),
                        "attachmentStatus" : [ ]
                ,
                
                        "_id" : ObjectId("5ff5d6c066cacc0012ed655a"),
                        "fileName" : "testerABC.txt",
                        "fileType" : "text/plain",
                        "created" : ISODate("2021-01-06T15:26:56.027Z"),
                        "attachmentStatus" : [
                                
                                        "_id" : ObjectId("60884f733f88bd00129b9ad4"),
                                        "status" : "Uploaded",
                                        "date" : ISODate("2021-04-22T02:23:00Z")
                                ,
                                
                                        "_id" : ObjectId("60884f733f88bd00129b9ad5"),
                                        "status" : "Processing",
                                        "date" : ISODate("2021-04-26T04:54:00Z")
                                
                        ]
                ,
                
                        "_id" : ObjectId("6075c82a19fdcc0012f81907"),
                        "fileName" : "Test file.docx",
                        "fileType" : "application/word",
                        "created" : ISODate("2021-04-13T16:34:50.955Z"),
                        "attachmentStatus" : [
                                
                                        "_id" : ObjectId("72844f733f88bd11479b9ad7"),
                                        "status" : "Uploaded",
                                        "date" : ISODate("2021-04-23T03:42:00Z")
                                ,
                                
                                        "_id" : ObjectId("724986d73f88bd00147c9wt8"),
                                        "status" : "Completed",
                                        "date" : ISODate("2021-04-24T01:37:00Z")
                                
                        ]
                
        ]

 "userName": "janeDoe",
 "email": "janeDoe@gmail.com",
 "uploads" : [
                
                        "_id" : ObjectId("ej9784652h0012ce9da0"),
                        "fileName" : "myResume.txt",
                        "fileType" : "text/plain",
                        "created" : ISODate("2021-02-13T12:36:14.166Z"),
                        "attachmentStatus" : [
                                
                                        "_id" : ObjectId("15dhdf6f88bd00147c9wt8"),
                                        "status" : "Completed",
                                        "date" : ISODate("2021-04-24T01:37:00Z")
                                
                  ]
                , 

如何为每个上传的文件提取最新的附件状态,然后总结状态?

我想要这样的东西:

 "status" : "Completed", "Count" : 2 
 "status" : "Processing", "Count" : 1 
...

我非常接近这个聚合查询,但它会抓取每一个状态,而不仅仅是每个文件的单个最新状态。 (每个文件一个当前状态)。

db.myDB.aggregate([

  "$match" : 
    "uploads.attachmentStatus": 
      "$elemMatch":  "status":  "$exists": true  
      
    
,
 $unwind: "$uploads",
 $unwind: "$uploads.attachmentStatus",

  $sortByCount: "$uploads.attachmentStatus.status"
,

  $project: 
    _id:0,
    status: "$_id",
    Count: "$count"
  

]).pretty();

有什么建议吗?

【问题讨论】:

【参考方案1】:

演示 - https://mongoplayground.net/p/zzOR9qhqny0


$sort: "uploads.attachmentStatus.date": -1 ,获取最新一期

$group: _id: "$uploads._id", status: $first: "$uploads.attachmentStatus.status" 按uploads._id对记录进行分组,取top状态(按日期排序后的最新状态)。

查询

 $sort:  "uploads.attachmentStatus.date": -1  ,
 $group:  _id: "$uploads._id", status:  $first: "$uploads.attachmentStatus.status"   ,

完成查询

db.collection.aggregate([
   $match:  "uploads.attachmentStatus":  "$elemMatch":   "status":  "$exists": true     ,
   $unwind: "$uploads" ,
   $unwind: "$uploads.attachmentStatus" ,
   $sort:  "uploads.attachmentStatus.date": -1  ,
   $group:  _id: "$uploads._id", status:  $first: "$uploads.attachmentStatus.status"   ,
   $sortByCount: "$status" ,
   $project:  _id: 0, status: "$_id", Count: "$count"  
])

【讨论】:

以上是关于聚合查询从嵌套子文档中获取最新元素的计数的主要内容,如果未能解决你的问题,请参考以下文章

Impala 查询以获取计数聚合函数中使用的列的样本值

Mongodb $pull 执行从嵌套子数组中删除 ObjectId("... id") 。

聚合查询的计数不正确

MongoDB聚合查询 - 返回嵌套子文档中的所有项目

检索与 $group 聚合中的指定条件匹配的计数

Mongoose 聚合获取计数并将新值附加到查询结果