ryangerard.net

May 10 2012

MongoDB MapReduce doesn’t always reduce

Today I decided to dive into the mapreduce functionality provided by MongoDB.  I have some batch processing jobs involving a decently large amount of data, and overall it was a pleasant experience implementing those jobs.  Mongo uses a javascript interpreter to run your map and reduce functions, so as long as you have some JS under your belt, you’re good to go.

While writing unit tests for some of this new functionality today that utilized mapreduce, I noticed something odd: the collection that the mapreduce output in my unit tests had a slightly different schema than the collection output when my real server ran the same jobs.

The map reduce looked similar to this:

var map = function() {
        emit(this.team, { count : 1 } );
}

var reduce = function(key, values) {
        var result = {totalgames: 0};
        for(var item in values) {
                result.totalgames += values[item].count;
        }
        return result;
}

db.games.mapReduce(map, reduce, { out : "teamrecords" });

This is a pretty straight forward map reduce: I’m emitting a signal for each game that records a count of 1 for the team, which acts as the key.  The reduce is then adding up all the counts.  When I ran this against real data on the server, the teamrecords collection correctly had data that looked like this:

{ “_id” : “ari”, “value” : { “totalgames” : 11 } }

When I ran my unit tests, the collection had data that looked like this:

{ “_id” : “ari”, “value” : { “count” : 11 } }

After thumping my head against the table for about an hour, I figured it out: in my unit test, I only had one value emitted for the key “ari”.  Since there was only one value emitted for that key, the reduce function never ran, and the value emitted was put into the collection directly.

This functionality does make sense — there’s nothing to reduce, so why run the reduction step?  That being said, I was expecting the reduce function to run, and having it not run resulted in a different key being saved in my collection.  It seems like a minor optimization made by mongodb that could potentially result in large errors.

  /  

+
I’ve built my career by 20+ years of looking people in the eyes, making promises and then delivering against what I said I would do. You don’t build trust, friendship and human bonds on Skype. - Mark Suster

1 note  /  

+

iPhone Spotlight View

I’ve recently been working on some tutorial screens for a new app, and have created something I think is rather neat.  It’s a view that provides a spotlight effect over whatever aspect of the screen you want to highlight.  

For tutorial screens, a view like this allows you to isolate specific things you want the user to pay attention to, such as login buttons, point boxes, and notification bars.  The label next to the spotlight acts as a button, so that when the user presses the button, the spotlight can animate to the next section of the screen you want to highlight.

The code has all been open sourced on github.  Feel free to email me with questions.  Here is a screenshot of the spotlight overlay on a plain white view.

Spotlight Screenshot

  /  

Mar 28 2012

Effective unit testing of node.js with futures

Recently I had the idea to attempt unit testing of my node.js codebase.  There are some decent resources out there that cover this concept, and this blog post is a summary of my own thoughts and findings while implementing unit testing for node.js.

Lets start with the basic issue: testing asynchronous code is not straight forward. For most unit testing (and testing in general), you assume items will move in a linear fashion: follow steps 1-3, and you should see some expected result. Because node.js forces you to work in a more async fashion, you may not necessarily know when something is complete, which forces us to look for other ways to find out when that something is complete.

I’ve written below a contrived and simple node.js app, using express and mongodb, with a login mechanism.  We want to unit test the login mechanism.  The app may look something like this:

// login.js
var
    express = require('express'),
    app = express.createServer(),

    // Database Config
    mongo = require('mongojs'),
    mongoStore = require('connect-mongodb'),
    db = mongo.connect('dbname',['users']);

// Configuration
app.configure(function(){
    ...
});

app.listen(3000);

app.post('/login', function(req, res){
    var email = req.body.email, password = req.body.password;
    db.users.find({'email':email,'password':password}).forEach(function(err, user) {
        res.send('Found user ' + user);
    });
});

As you can see, it’s pretty straight forward: a POST request containing an email address and password are searched for in the mongodb instance.  If that user exists, then return that user object.

Now, lets take a step back for a moment and start building a simple unit test for this mechanism.  To do this, I’ve been using nodeunit, which provides a reliable and simple framework for executing tests, and reporting results.  Here is an example of how we’d like to write the test:

// login-unit.js
var login = require('login.js');
exports.testLogin = function(test){
    test.notEqual(login.loginUser('email', 'pass'), null, "The user was null!");
    test.done();
};

Clearly our application code can’t yet support this unit test.  The login logic needs some refactoring so that it’s externally accessible:

// login.js
var
    express = require('express'),
    app = express.createServer(),

    // Database Config
    mongo = require('mongojs'),
    mongoStore = require('connect-mongodb'),
    db = mongo.connect('dbname',['users']);

// Configuration
app.configure(function(){
    ...
});

app.listen(3000);

function loginUserPrivate(email, pass) {
    db.users.find({'email':email,'password':password}).forEach(function(err, user) {
        return user;
    });
}

app.post('/login', function(req, res){
    var email = req.body.email, password = req.body.password;

    var user = loginUserPrivate(email, password);
    res.send('Found user ' + user);
});

module.exports = {
    loginUser: function(email, pass) {
        return loginUserPrivate(email, pass);
    }
}

As you can see, we’ve added an export function for the login code, so that the unit test can access it.  However, there is a problem.  Due to the asynchronous nature of node.js, that user value from the private function is not being returned correctly.  The private function will return immediately, and not wait for the db call to finish.  To fix this, we will use futures, which are also commonly called promises, or deferred objects.  To learn more about this concept, this article sums up the concept well.

I’m currently using this futures module, but there are other modules out there, and I encourage you to experiment.  If we refactor the application code to use futures, it will look like this:

// login.js
var
    express = require('express'),
    app = express.createServer(),

    // Future object
    Future = require('future'),

    // Database Config
    mongo = require('mongojs'),
    mongoStore = require('connect-mongodb'),
    db = mongo.connect('dbname',['users']);

// Configuration
app.configure(function(){
    ...
});

app.listen(3000);

function loginUserPrivate(email, pass) {
    var future = new Future();     db.users.find({'email':email,'password':password}).forEach(function(err, user) {
        future.deliver(err, user);
    });
    return future;
}

app.post('/login', function(req, res){
    var email = req.body.email, password = req.body.password;

    var future = loginUserPrivate(email, password);

    future.when (function (error, user) {
        res.send('Found user ' + user);
    });
});

module.exports = {
    loginUser: function(email, pass) {
        return loginUserPrivate(email, pass);
    }
}

The changes above show one way to get around the async nature of node.js for testing purposes.  The private login function will return a future object immediately to the caller.  The caller will then essentially subscribe to an event (with future.when()) telling it when the object has data.  The caller can then access this data from the callback function.

We can refactor the unit test code to also use futures, and thereby effectively test out the login functionality:

// login-unit.js var login = require('login.js'),
    Future = require('future'),

exports.testLogin = function(test){
    var future = login.loginUser('email', 'pass');
    future.when (function (error, user) {
        test.notEqual(user, null, "The user was null!");
        test.done()
    });
};

Some of you may wonder why we need to use futures / promises at all.  Indeed, we could just send in a callback function as an extra parameter to the login function that would get called when the async call is finished.  However, there are other benefits to using futures: chaining of objects and default timeouts are just two of the benefits that come with using a futures object.

To recap, what I’ve been trying to show today is that using futures/promises/deferred objects are effective ways to unit test node.js code, despite it’s asynchronous nature. The future objects returned from the private login method allow you to essentially subscribe to an event that tells you when the async call is finished, and passes back the result of that async call.  In addition, I believe the above code is more structured and less brittle.

Now that I have a method to unit test my node.js code, I can happily move forward and have some degree of certainty that my code is working correctly.

  /  

Mar 15 2012

How to remove and add a persistent store in iOS 4

For the Xobni iPhone app that I’ve been working on, we made a decision that if the user logs out of the app, we should completely remove their data.  Logging out isn’t a common operation, and it gives users a chance to “reset” their data, in case they need to.  To accomplish this, we decided the safest way to clear out the users data is to completely remove their persistent store file, and then immediately recreate it.  I realize that you can delete all data from core data using other methods, but this ensures that there isn’t any lingering data in case a core data “delete all” method should fail.  It presents one less variable to think about when debugging data issues that users report.

We found this action to be very useful and suitable in accomplishing our goal of resetting the user, however there was one issue: the persistent store wasn’t getting recreated correctly in iOS 4. It worked perfectly fine in iOS 5, however. 

I spent a lot of time debugging the issue, and I found that iOS 4 exhibited some peculiar timing issues when we ran this operation.  Here is essentially the order of operations:

  1. Release and nil out the managed object contexts, managed object model, persistent store coordinator, and persistent store
  2. Remove the actual file that the persistent store points to
  3. Recreate the managed object model, persistent store coordinator, and add a persistent store pointing to the same file as before

The act of recreating the persistent store should create the file on disk.  As stated, on iOS 5, this all worked perfectly fine, but for iOS 4, I saw the following happen:

  1. When we removed the actual file on disk that the persistent store points to (step 2 from above), I could see the file disappear when looking in a terminal window
  2. When we recreated the persistent store, I saw the file appear again on disk (step 3 from above)
  3. After the next run loop, the file would disappear again!

I did many searches to verify that the file wasn’t getting deleted due to some extraneous code that would try to delete the file again. No, this deletion was coming from some timing bug within iOS. Essentially what we’re doing is deleting the objects and file on disk, and then recreating them again in one shot. For some reason, iOS executes these instructions, but then executes the file deletion code again on the next run loop.

Solving this was tricky — no errors are getting returned, as the persistent store coordinator, and more importantly the persistent store, were all created successfully. The problem is that the file on disk is deleted a short time after the persistent store is created on iOS 4. What we ended up doing was the following:

  • Whenever you save the managed object context, check the error code
  • If the error code is NSPersistentStoreSaveError, try to remove and re-add the persistent store to the persistent store coodinator.

We’re using the error code NSPersistentStoreSaveError as a signal that this issue has happened. If we see that error code when saving data from a managed object context, we will try to create the persistent store again. This solution works for us, and I hope it works for you too.

Here is some code to help you out, in case you’re running into this as well:

if (![self.managedObjectContext save:&error]) {
  if([error code] == NSPersistentStoreSaveError) {
    NSLog(@"Persistent store save error");
    [self.persistentStoreCoordinator removePersistentStore:self.persistentStore error:&error];
    self.persistentStore = nil;
    self.persistentStore = [self.persistentStoreCoordinator addPersistentStoreWithType:...];
  }
}

  /  

Jan 28 2012

Gotcha when searching substrings in Objective-C

In Objective-C, when you want to look for the substring of a string, it’s very common to use the rangeOfString function of the NSString object, like so:

NSString *fullStr = @"the quick brown fox";
NSString *subStr = @"quick";
if([fullStr rangeOfString:subStr options:NSCaseInsensitiveSearch].location != NSNotFound) {
    // We found the substring in the string
}

The code snippet above basically does the following:

  1. Searches for the substring “quick” inside the string referenced by fullStr, and returns an NSRange object with the values {NSNotFound, 0}
  2. Checks the location property of the NSRange object to verify that the string was actually found

Pretty straight-forward.  However, I found one gotcha today: if you’re using this pattern to search for substrings, you must verify that your original string isn’t nil!

In the case that the string you’re searching is nil, the NSRange object returned will have the values {0,0}.  Notice that it didn’t return NSNotFound.  So, if your string happened to be nil, then the if condition would pass as if you had found the substring!

Here is the code you want to use:

NSString *fullStr = @"the quick brown fox";
NSString *subStr = @"quick";
if(fullStr && [fullStr rangeOfString:subStr options:NSCaseInsensitiveSearch].location != NSNotFound) {
    // We found the substring in the string
}

  /  

Nov 26 2011

Extracting myself from the world of tech news

I developed the habit years ago of reading up on as much tech news as I could find.  Not just every day, but multiple times a day, my breaks would be filled with reading up on the latest tech gossip.  My tech news sources have moved around over the years, from Digg, to reddit, to Techcrunch, to YC News, back to reddit, and then back to YC News.  I usually have 2-3 major sites I regularly check per day.  The justification for all the reading was sound: learn as much as I can about my industry.  I still believe in this idea, and I do believe that all the reading over the years has helped with this goal.

The tech news can be broken down into a few major categories:

  • Company X raised $Y from {set of rotating VC names}
  • Company X released {product}, or an update to {other product}
  • Company X acquired Company Y
  • Let me give you some advice that you already know

I’m being a little glib here, as there are probably other categories and subcategories, and subcategories of subcategories I’m missing, but these topics probably constitute 90% of what I read online.

Checking the tech news sites has also been a way for me to take mental breaks between programming sessions, allowing me to still be somewhat efficient (i.e., learning about my industry) during my downtime.

That being said, recently I’ve been feeling that this news stream has been wearing on me mentally.  It’s become a habit that I don’t feel is helping me to progress in any discernible way — in fact, it’s become quite distracting.  I love being exposed to new ideas and new ways of thinking, but I don’t feel that my tech news sources are providing that for me any longer.  The tech news certainly has a bias toward the new and sexy (with sprinkles of kittens - like any good news source), so when a new and interesting product / idea / language emerges, one gets inundated with the same types of stories over and over.  If you consider yourself a creative person, or aspire to be a creative type, I don’t think that constant exposure to the same idea is good or healthy for the creative part of your brain.

I’m not trying to make a statement about the state of tech news, lamenting about how the news used to be pure and wholesome, riding on unicorns and rainbows for as far as the eye can see.  I can still see younger programmers getting a lot out of all this news.  For me, however, I think I need to take a break from the news.  I need to break some of my mental patterns to see what emerges.  I want to come up with new and interesting products, and it’s very hard for me to do that when I keep reading about social games, photo-sharing apps, apps to find your friends in bars, and reading about how node.js is going to save humanity.  The slightly ironic aspect to this is that by extracting myself from the tech news, I hope to come up with something worthy of that circle of tech news sites.  

I have a short and simple plan to develop new ideas. Step one is to start becoming ignorant of what is going on around me.  Step two (to be run in parallel with step one), is to start learning about a wider variety of topics, including but not limited to: chess strategy, the history of the Samurai, how to solve a Rubik’s cube, and re-learn how to play piano.  Step three is to meditate more.

  /  

Nov 05 2011

I don’t want to work on just anything

What is the worst thing you can tell an employee?

“You can work on anything you want.”

As Dan Gilbert says, we manufacture our own happiness, but unbounded freedom of choice is the enemy of happiness.  In his social experiments, he shows that when we have the freedom to choose anything, we aren’t as happy with that choice afterward, compared to when we have limited choices (or no choice).  Barry Schwartz writes about this as well in The Paradox of Choice.  

I don’t want to work on anything.  I want to work on something that matters.  I want to work on something that will make a difference: either from the perspective of the team, the company, or the world at large.  For that to happen, I need direction, and vision.  I need a narrower focus than anything.

So, my advice: reduce your choices, and narrow your focus.

3 notes  /  

Sep 08 2011

Favorite Quora Questions

I have found myself using Quora more and more.  It’s certainly a useful site, and people are actively engaged on it.  I’ve been impressed with the answers to my own questions.  In any case, I wanted to list out my favorite Quora questions to date.  These are generally questions that captured my interest in some way, shape, or form.

What are you favorite questions?

  /  

Sep 07 2011
Page 1 of 3