Mongodb的mapreduce

Posted 张小贱1987

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Mongodb的mapreduce相关的知识,希望对你有一定的参考价值。

简单的看了一下mapreduce,我尝试不看详细的api去做一个group效果,结果遇到了很多问题,罗列在这里,如果别人也遇到了类似的bug,可以检索到结果。

  1. //先看person表的数据
  2. > db.person.find();
  3. { "_id" : ObjectId("593011c8a92497992cdfac10"), "name" : "xhj", "age" : 30, "address" : DBRef("address", ObjectId("59314b07e693aae7a5eb72ab")) }
  4. { "_id" : ObjectId("59301270a92497992cdfac11"), "name" : "zzj", "age" : 2 }
  5. { "_id" : ObjectId("593015fda92497992cdfac12"), "name" : "my second child", "age" : "i do not know" }
  6. { "_id" : ObjectId("592ffd872108e8e79ea902b0"), "name" : "zjf", "age" : 30, "address" : { "province" : "河南省", "city" : "南阳市", "building" : "桐柏县" } }
  7. //使用聚合来做一个group by
  8. > db.person.aggregate({$group : {_id: ‘$age‘, count : {$sum : 1}}})
  9. { "_id" : "i do not know", "count" : 1 }
  10. { "_id" : 2, "count" : 1 }
  11. { "_id" : 30, "count" : 2 }
  12. //下面尝试用map reduce来做同样的group by效果
  13. //很简单的逻辑 定义map函数 和reduce函数
  14.  
  15. > var m = function(){ emit(this.age,1) };
  16. > var r = function(key,values){
  17. ... var sum = 0;
  18. ... values.forEach(function(val){
  19. ... sum += val;
  20. ... });
  21. ... return sum;
  22. ... }
  23.  
  24. //然后在person上执行mapreduce 这样会报错 需要一个optionsOrOutString
  25. > db.person.mapReduce( m, r ).find();
  26. assert failed : need to supply an optionsOrOutString
  27. Error: assert failed : need to supply an optionsOrOutString
  28.     at Error (<anonymous>)
  29.     at doassert (src/mongo/shell/assert.js:11:14)
  30.     at assert (src/mongo/shell/assert.js:20:5)
  31.     at DBCollection.mapReduce (src/mongo/shell/collection.js:1343:5)
  32.     at (shell):1:11
  33. 2017-06-03T12:42:06.704+0800 E QUERY Error: assert failed : need to supply an optionsOrOutString
  34.     at Error (<anonymous>)
  35.     at doassert (src/mongo/shell/assert.js:11:14)
  36.     at assert (src/mongo/shell/assert.js:20:5)
  37.     at DBCollection.mapReduce (src/mongo/shell/collection.js:1343:5)
  38.     at (shell):1:11 at src/mongo/shell/assert.js:13
  39. //加了一个而空的option 又说要有一个string或者object的out参数
  40. > db.person.mapReduce( m, r,{} ).find();
  41. 2017-06-03T12:42:24.726+0800 E QUERY Error: map reduce failed:{
  42.    "errmsg" : "exception: ‘out‘ has to be a string or an object",
  43.    "code" : 13606,
  44.    "ok" : 0
  45. }
  46.     at Error (<anonymous>)
  47.     at DBCollection.mapReduce (src/mongo/shell/collection.js:1353:15)
  48.     at (shell):1:11 at src/mongo/shell/collection.js:1353
  49. //我尝试定义一个变量 不行
  50. > var outstr;
  51. > db.person.mapReduce( m, r,{out:outstr} ).find();
  52. 2017-06-03T12:42:45.502+0800 E QUERY Error: map reduce failed:{
  53.    "errmsg" : "exception: ‘out‘ has to be a string or an object",
  54.    "code" : 13606,
  55.    "ok" : 0
  56. }
  57.     at Error (<anonymous>)
  58.     at DBCollection.mapReduce (src/mongo/shell/collection.js:1353:15)
  59.     at (shell):1:11 at src/mongo/shell/collection.js:1353
  60. //后来我了解到out需要的一个collection 于是我加了一个字符串 ‘outt‘作为保存数据的集合名字
  61.  
  62. > db.person.mapReduce( m, r,{out:‘outt‘} ).find();
  63. { "_id" : 2, "value" : 1 }
  64. { "_id" : 30, "value" : 2 }
  65. { "_id" : "i do not know", "value" : 1 }
  66. //此时outt中也保存了数据 我不明白的是 不定义out参数 不是应该可以直接find就可以了吗 为什么要多此一举呢
  67. > db.outt.find();
  68. { "_id" : 2, "value" : 1 }
  69. { "_id" : 30, "value" : 2 }
  70. { "_id" : "i do not know", "value" : 1 }

因为遇到了这么多问题,所以看了Mongodb的文档(https://docs.mongodb.com/manual/reference/method/db.collection.mapReduce/),梳理了一下,总结如下:

命令方式:

  1. db.runCommand(
  2.                {
  3.                  mapReduce: <collection>,
  4.                  map: <function>,
  5.                  reduce: <function>,
  6.                  finalize: <function>,
  7.                  out: <output>,
  8.                  query: <document>,
  9.                  sort: <document>,
  10.                  limit: <number>,
  11.                  scope: <document>,
  12.                  jsMode: <boolean>,
  13.                  verbose: <boolean>,
  14.                  bypassDocumentValidation: <boolean>,
  15.                  collation: <document>
  16.                }
  17.              )

简单方式:

  1. db.collection.mapReduce(map, reduce, {<out>, <query>, <sort>, <limit>, <finalize>, <scope>, <jsMode>, <verbose>})

 

以上是关于Mongodb的mapreduce的主要内容,如果未能解决你的问题,请参考以下文章

大数据框架之Hadoop:MapReduceMapReduce框架原理——OutputFormat数据输出

大数据框架之Hadoop:MapReduceMapReduce框架原理——数据清洗(ETL)

大数据框架之Hadoop:MapReduceMapReduce框架原理——Join多种应用

技术汇总

MapReduce

大数据技术栈