Mongodb的mapreduce
Posted 张小贱1987
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Mongodb的mapreduce相关的知识,希望对你有一定的参考价值。
简单的看了一下mapreduce,我尝试不看详细的api去做一个group效果,结果遇到了很多问题,罗列在这里,如果别人也遇到了类似的bug,可以检索到结果。
- //先看person表的数据
- > db.person.find();
- { "_id" : ObjectId("593011c8a92497992cdfac10"), "name" : "xhj", "age" : 30, "address" : DBRef("address", ObjectId("59314b07e693aae7a5eb72ab")) }
- { "_id" : ObjectId("59301270a92497992cdfac11"), "name" : "zzj", "age" : 2 }
- { "_id" : ObjectId("593015fda92497992cdfac12"), "name" : "my second child", "age" : "i do not know" }
- { "_id" : ObjectId("592ffd872108e8e79ea902b0"), "name" : "zjf", "age" : 30, "address" : { "province" : "河南省", "city" : "南阳市", "building" : "桐柏县" } }
- //使用聚合来做一个group by
- > db.person.aggregate({$group : {_id: ‘$age‘, count : {$sum : 1}}})
- { "_id" : "i do not know", "count" : 1 }
- { "_id" : 2, "count" : 1 }
- { "_id" : 30, "count" : 2 }
- //下面尝试用map reduce来做同样的group by效果
- //很简单的逻辑 定义map函数 和reduce函数
- > var m = function(){ emit(this.age,1) };
- > var r = function(key,values){
- ... var sum = 0;
- ... values.forEach(function(val){
- ... sum += val;
- ... });
- ... return sum;
- ... }
- //然后在person上执行mapreduce 这样会报错 需要一个optionsOrOutString
- > db.person.mapReduce( m, r ).find();
- assert failed : need to supply an optionsOrOutString
- Error: assert failed : need to supply an optionsOrOutString
- at Error (<anonymous>)
- at doassert (src/mongo/shell/assert.js:11:14)
- at assert (src/mongo/shell/assert.js:20:5)
- at DBCollection.mapReduce (src/mongo/shell/collection.js:1343:5)
- at (shell):1:11
- 2017-06-03T12:42:06.704+0800 E QUERY Error: assert failed : need to supply an optionsOrOutString
- at Error (<anonymous>)
- at doassert (src/mongo/shell/assert.js:11:14)
- at assert (src/mongo/shell/assert.js:20:5)
- at DBCollection.mapReduce (src/mongo/shell/collection.js:1343:5)
- at (shell):1:11 at src/mongo/shell/assert.js:13
- //加了一个而空的option 又说要有一个string或者object的out参数
- > db.person.mapReduce( m, r,{} ).find();
- 2017-06-03T12:42:24.726+0800 E QUERY Error: map reduce failed:{
- "errmsg" : "exception: ‘out‘ has to be a string or an object",
- "code" : 13606,
- "ok" : 0
- }
- at Error (<anonymous>)
- at DBCollection.mapReduce (src/mongo/shell/collection.js:1353:15)
- at (shell):1:11 at src/mongo/shell/collection.js:1353
- //我尝试定义一个变量 不行
- > var outstr;
- > db.person.mapReduce( m, r,{out:outstr} ).find();
- 2017-06-03T12:42:45.502+0800 E QUERY Error: map reduce failed:{
- "errmsg" : "exception: ‘out‘ has to be a string or an object",
- "code" : 13606,
- "ok" : 0
- }
- at Error (<anonymous>)
- at DBCollection.mapReduce (src/mongo/shell/collection.js:1353:15)
- at (shell):1:11 at src/mongo/shell/collection.js:1353
- //后来我了解到out需要的一个collection 于是我加了一个字符串 ‘outt‘作为保存数据的集合名字
- > db.person.mapReduce( m, r,{out:‘outt‘} ).find();
- { "_id" : 2, "value" : 1 }
- { "_id" : 30, "value" : 2 }
- { "_id" : "i do not know", "value" : 1 }
- //此时outt中也保存了数据 我不明白的是 不定义out参数 不是应该可以直接find就可以了吗 为什么要多此一举呢
- > db.outt.find();
- { "_id" : 2, "value" : 1 }
- { "_id" : 30, "value" : 2 }
- { "_id" : "i do not know", "value" : 1 }
因为遇到了这么多问题,所以看了Mongodb的文档(https://docs.mongodb.com/manual/reference/method/db.collection.mapReduce/),梳理了一下,总结如下:
命令方式:
- db.runCommand(
- {
- mapReduce: <collection>,
- map: <function>,
- reduce: <function>,
- finalize: <function>,
- out: <output>,
- query: <document>,
- sort: <document>,
- limit: <number>,
- scope: <document>,
- jsMode: <boolean>,
- verbose: <boolean>,
- bypassDocumentValidation: <boolean>,
- collation: <document>
- }
- )
简单方式:
- db.collection.mapReduce(map, reduce, {<out>, <query>, <sort>, <limit>, <finalize>, <scope>, <jsMode>, <verbose>})
以上是关于Mongodb的mapreduce的主要内容,如果未能解决你的问题,请参考以下文章
大数据框架之Hadoop:MapReduceMapReduce框架原理——OutputFormat数据输出
大数据框架之Hadoop:MapReduceMapReduce框架原理——数据清洗(ETL)