mapReduce/Aggregation：按嵌套文档中的值分组

Posted 2023-03-11

技术标签:

【中文标题】mapReduce/Aggregation：按嵌套文档中的值分组【英文标题】：mapReduce/Aggregation: Group by a value in a nested document 【发布时间】：2012-09-27 00:49:12 【问题描述】：

想象一下我有这样的收藏：


  "_id": "10280",
  "city": "NEW YORK",
  "state": "NY",
  "departments": [
             "departmentType":"01",
              "departmentHead":"Peter",
             "departmentType":"02",
              "departmentHead":"John"
  ]
,

  "_id": "10281",
  "city": "LOS ANGELES",
  "state": "CA",
  "departments": [
             "departmentType":"02",
              "departmentHead":"Joan",
             "departmentType":"03",
              "departmentHead":"Mary"
  ]
,

  "_id": "10284",
  "city": "MIAMI",
  "state": "FL",
  "department": [
  "departments": [
             "departmentType":"01",
              "departmentHead":"George",
             "departmentType":"02",
              "departmentHead":"Harry"
  ]

我想获得每个部门类型的计数，例如：

["departmentType":"01", "dCount":2,
 "departmentType":"02", "dCount":3,
 "departmentType":"03", "dCount":1
]

为此，我已经尝试了几乎所有方法，但我在网上找到的所有示例都比较简单，其中 group by 是在文档根级别的字段上完成的。相反，我在这里尝试按部门类型进行分组，这似乎破坏了我迄今为止发现的所有内容。

关于如何使用 Mongoose 的聚合实现或 mapreduce 来做到这一点的任何想法？

理想情况下，我想排除所有 count

提前谢谢大家！

【问题讨论】：

你确定最后一个部门是正确的吗？它看起来无效。 【参考方案1】：

您需要 $unwind 部门数组，这将为数组中的每个条目创建一个文档，以便您可以在管道中聚合它们。

很遗憾，您不能预先过滤 departmentTypes

db.runCommand(
    aggregate: "so",
    pipeline: [
           // filter out only records with 2 departments
            $match: 
                departments:  $size: 2 
            
        ,
        // unwind - create a doc for each department in the array
         $unwind: "$departments" ,
           // aggregate sum of departments by type
            $group: 
                _id: "$departments.departmentType",
                count:  $sum: 1 ,
            
        ,
           // filter out departments with <=1
            $match: 
                count:  $gt: 1 ,
            
        ,
           // rename fields as per example
            $project: 
                _id: 0,
                departmentType: "$_id",
                dCount: "$count",
            
        
    ]
);

请注意，我还假设您之前的 json 示例有错字，并且“部门”实际上并不存在。假设所有文档都具有与前两个相同的架构，则此代码将起作用。

如果您不关心获得的实际字段名称，请随意删除第一个 $match 和最后一个 $project。

【讨论】：

嗨，好的，尝试了这个，一些可能很愚蠢的事情：将语法重写为 mongoose.connection.db.executeDbCommand(... 并添加

, function(err, dbres)  			if (err) 				throw err; 			else 				console.log(dbres); 		 		);´ at the end to catch the results.  It then said that ´ aggregate: "so",´ NO SUCH CMD: "aggregate".  Any idea?  I did reuse the first $match:   ´$match:     departmentThatExist : "departments.departmentType": $exists: true

以排除没有部门类型的那些。您使用的是比 2.2 更旧的 MongoDB 版本吗？聚合内容直到当前最新版本才发布。考虑到问题陈述，$size 上的第一个 $match 不是正确的过滤器。最后一个 $match 将摆脱只出现一次的部门类型。考虑到原始要求，最后还应该有一个 $sort。您对只有一种类型的预过滤部门的评论根本不正确 - 没有办法进行预过滤，但不是因为数组的 $size，而是因为直到聚合的 $group 阶段之后，才知道每个部门类型（在所有文档中）的计数。啊。你说得对。这不是他想要的部门数

以上是关于mapReduce/Aggregation：按嵌套文档中的值分组的主要内容，如果未能解决你的问题，请参考以下文章