MongoDB MapReduce doesn’t always reduce
Today I decided to dive into the mapreduce functionality provided by MongoDB. I have some batch processing jobs involving a decently large amount of data, and overall it was a pleasant experience implementing those jobs. Mongo uses a javascript interpreter to run your map and reduce functions, so as long as you have some JS under your belt, you’re good to go.
While writing unit tests for some of this new functionality today that utilized mapreduce, I noticed something odd: the collection that the mapreduce output in my unit tests had a slightly different schema than the collection output when my real server ran the same jobs.
The map reduce looked similar to this:
var map = function() {
emit(this.team, { count : 1 } );
}
var reduce = function(key, values) {
var result = {totalgames: 0};
for(var item in values) {
result.totalgames += values[item].count;
}
return result;
}
db.games.mapReduce(map, reduce, { out : "teamrecords" });
This is a pretty straight forward map reduce: I’m emitting a signal for each game that records a count of 1 for the team, which acts as the key. The reduce is then adding up all the counts. When I ran this against real data on the server, the teamrecords collection correctly had data that looked like this:
{ “_id” : “ari”, “value” : { “totalgames” : 11 } }
When I ran my unit tests, the collection had data that looked like this:
{ “_id” : “ari”, “value” : { “count” : 11 } }
After thumping my head against the table for about an hour, I figured it out: in my unit test, I only had one value emitted for the key “ari”. Since there was only one value emitted for that key, the reduce function never ran, and the value emitted was put into the collection directly.
This functionality does make sense — there’s nothing to reduce, so why run the reduction step? That being said, I was expecting the reduce function to run, and having it not run resulted in a different key being saved in my collection. It seems like a minor optimization made by mongodb that could potentially result in large errors.
