For the geeks: Github data viz!

I got to work on this awesome project for the last couple weeks. I had a ton of fun data-mining and playing with d3. Check it out:

gitviz.jit.su

Also, check out some of the questions I approached, and the solutions I came up with.

  • How do I not go into callback hell while doing multiple API calls?
  • How do I mine and then return a large amount of data to a client?

Solutions:

Using Node’s EventEmitter library

GitViz uses the EventEmitter library for two main purposes:

  • Preventing callback messes.
  • It makes streaming data incrementally to the client clean and simple

Here is how I did it:

Require the library:

EventEmitter = require('events').EventEmitter

Create a unique instance of event emitter. I built an ‘initialize’ function that wraps the creation process. I export this to my express server file.

init = () ->
#this 'reset' method re-initializes global constants used in the script.
  reset()
  eventMaker = new EventEmitter
  eventMaker.get = get
  return eventMaker

When I run init() in the express server, it gives me both the “get” function, which makes the necessary API calls, and the ability to emit events, and set triggers.

Here is the relevant server code:

app.get '/query/:user/repo/:repo', (req, res) ->
  repoRoute = req.params.user + '/' + req.params.repo
  db.Commit.findOne { 'repo': repoRoute }, 'commits', (err, commitList) ->
    throw err if err
    if commitList
      res.send commitList.commits
    else
      res.write '['
#getCommits.init() is the same as init() above.
      commitStream = getCommits.init()
      commitStream.on 'commit', (commit) ->
        res.write commit
      commitStream.on 'end', (string) ->
        res.end ']'
      commitStream.get req.params.user, req.params.repo

After the event emitter has been initialized, I now have access to setting triggers with “commitStream.on”

These line up with when I emit a ‘commit’ event from inside the worker, after I have the latitude and longitudes for any given commit:

currentRequest.emit 'commit', commitLocation

This event emitter was easy to implement, but I got snagged temporarily by not having an initialize function wrapping around it.

Before I wrapped the instantiation of the event emitter (‘new EventEmitter’) in an initialize function, I was adding events and triggers to the same event emitter on every ‘get’ request from the client to the server. This resulted in pre-mature triggering of the events after the first time the eventEmitter is set.

Be careful of premature event emission!

Using res.write and res.end

Grabbing all the commits on a large repository can take more than 2 minutes. When deciding between sockets or get requests, it felt intuitive to just use a socket and emit an event to the client when all the commits were ready, and then just send them all at once.

Instead of implementing sockets, I just stuck with node’s native ‘write’ and ‘end’ function, combined with the event emitter.

Every time this line ran on the node worker:
currentRequest.emit ‘commit’, commitLocation

It triggered this event on the server:

#this line opens an array to place commits into
res.write '['
#this event places events into the array
commitStream.on 'commit', (commit) ->
    res.write commit

By using res.write instead of res.send or res.end, I allow the connection to receive a small amount of the data. This prevents the browser from shutting down a hanging get request.

When the last commit is ready, the worker emits this event:
currentRequest.emit ‘end’, ‘done!’

Resulting in ending the response and closing the array on the client:

commitStream.on 'end', (string) ->
    res.end ']'

res.end tells the client to close the response.

The Dwarven Node Miner

I had to write a bio for myself recently, and was asked this question…It was also April Fool’s.

What is the most exciting piece of code you’ve written?

The console logs “saved” and my eyes light up with joy. I am a bearded dwarf, my node worker is my pick axe, and I am deep in the mines of Nested API Calls. I open my mongoDB shell and I look into my collections. Precious JSON jewels everywhere!

After hours of toil and sweat dripping from my long dwarf-beard, I have finally found it.

I…

  • pulled every commit, ever, on a large open-source repo on Github,
  • plucked the author of every commit,
  • sent a second request for more details on each author,
  • got their latitude and longitude from the google maps API,
  • sent one commit with geo-data at a time to the client using events

…then ended the response after the last shiny JSON object was sent up the mineshaft.

I stare lovingly at my trusty Node pick-axe. It is adorned with events to keep nested callback goblins at bay, and crystal clear method names and responsibilities gild it’s edges. The shiniest axe I have ever crafted.

I do a dwarven diddy and song to celebrate.