The DataHero Blog

Node.js – Preserving Data Across Async Callbacks

May 22nd, 2014

At DataHero we use the Node.js coupled with express as our server-side framework. One of the strengths of Node.js is that it is asynchronous. However this is a double edged sword in that it can make debugging hard. If you’ve ever printed a stack trace in Node.js, you’ve seen that the call stack only goes until the event loop.

Likewise, if you’ve ever set up logging in an express application, you might’ve wished to include a transaction id (a unique identifier for each request) in each log message to show which request generated the log.

In this post I will show you how we use two new technologies: the AsyncListener API and a module called continuation-local-storage to solve this problem and make logging and debugging easier.

The Problem

The ability to track a single request by a transaction id and the fact that call stacks stop at the event loop both arise because after control goes back to the Node.js event loop, there is no guarantee which asynchronous function will run next.

In the diagram below func1 is trying to set the current transaction id (tid) on the global state. However, because we don’t know if func2 will be run immediately after, we can’t guarantee tid will still be 10 when func2 runs. If you wanted to have data preserved, you’d have to manually make it available to each callback either as an argument or via a closure.


 
Recently a push from Trevor Norris, Tim Caswell, and Forrest Norvell introduced the AsyncListener API and continuation-local-storage.

With the AsyncListener API you can set functions to be executed when a callback is instantiated, before the callback is executed, and after the callback is executed. This provides the necessary building block to solve the problem illustrated in the diagram above, we can associate some data or state with func2 when we call process.nextTick(func2). When func2 is run, we can look it up again. However, how the AsyncListener API works and how it’s used by continuation-local-storage is enough for it’s own post.

The continuation-local-storage module builds on the AsyncListener API to make associating arbitrary data with a callback very easy. For example you can associate the current request or stack trace from when a callback was instantiated, and look it up when the callback is executed. This new ability makes debugging easier by having more informative logs. In the rest of this post we’ll dive into the nitty-gritty of how to use the continuation-local-storage module. In a future post I’ll explain how the continuation-local-storage module, and the AsyncListener API works.

As a backdrop to this post, our goal will be to:

  1. Generate a transaction id for every request our express app receives.
  2. Have that transaction id available to our logger so we can know which log statements are related.

So with that, let’s get started.

Prep Work

First we need to install continuation-local-storage:

[javascript]
npm install –save continuation-local-storage
[/javascript]

Then we’ll create a middleware that generates a unique identifier, we’re going to use the uuid module to do this:

[javascript]
var uuid = require(‘node-uuid’);

// create a transaction id for each request
app.use(function(req, res, next) {
var tid = uuid.v4();
next();
});
[/javascript]

Dive into Continuation-Local-Storage

The next step is to create a namespace for your application. continuation-local-storage uses namespaces to ensure that data you store with it does not get mixed with data another module may be storing.

To do this, add this to the top of the main entry point in your node app (replacing com.datahero with your own namespace):

[javascript]
var createNamespace = require(‘continuation-local-storage’).createNamespace,
namespace = createNamespace(‘com.datahero’);
[/javascript]

Note that you should only create the namespace once, otherwise it will clear any values you may have stored.

After creating the namespace, you can now use it to store values. We’re going to outfit our middleware to store the transaction id on the namespace. Then we’re going to run all downstream middleware within the scope of the namespace. This will make values stored in the namespace available for later functions in the call stack.

There’s a lot going on here, but it’s broken down below.

[javascript]
var getNamespace = require(‘continuation-local-storage’).getNamespace,
namespace = getNamespace(‘com.datahero’),
uuid = require(‘node-uuid’);

// create a transaction id for each request
app.use(function(req, res, next) {
var tid = uuid.v4();

// wrap the events from request and response
namespace.bindEmitter(req);
namespace.bindEmitter(res);

// run following middleware in the scope of
// the namespace we created
namespace.run(function() {

// set tid on the namespace, makes it
// available for all continuations
namespace.set(‘tid’, tid);
next();
});
});
[/javascript]

The above code is doing the following things:

  1. getNamespace loads the namespace we previously created. If you didn’t call createNamespace yet getNamespace will return undefined.
  2. namespace.bindEmitter helps to ensure that event listeners will be within the scope of the namespace. This is needed because EventEmitter isn’t patched into the node core, like the AsyncListener API is.
  3. namespace.run makes all asynchronous functions run in the scope of the namespace. continuation-local-storage uses a stack to know what the current context is, so it pushes the first context onto the stack.
  4. namespace.set is how you store information on the namespace. The tid set will be available in the next callback, and all functions that are put on the call stack after next.

Extract the Transaction Id

In the code above we ran next within the scope of the namespace. This makes all functions called after next have access to data stored in the namespace. In particular this allows us to easily get the current transaction id in a custom logger.

This code shows how our Log.js file (custom logging code) would extract the transaction id:

[javascript]
var getNamespace = require(‘continuation-local-storage’).getNamespace,
namespace = getNamespace(‘com.datahero’);

Logger.debug = function(message) {
console.log(‘Debug: ‘ + message + ‘ with transaction id: ‘ + namespace.get(‘tid’));
};
[/javascript]

You don’t have to worry about passing tid down to each log message. Instead, the tid set in the current call stack will be available on the namespace.

Never lose context

Sometimes continuation-local-storage will lose track of the current context. This happens when the AsyncListener implementation can’t tell if a function will be used asynchronously.This is explained succinctly by Forrest in this comment.

Practically though, this means one of two things:

  1. Patch the npm module that loses context using a cls-* module. The cls stands for continuation-local-storage. A cls-* module wraps callbacks passed to an npm module with one that binds the callback. For example, for node-redis there is cls-redis and for q use cls-q.
  2. If there is no cls-* module, and you’re losing context, you can manually bind you callbacks to the namespace.

[javascript]
callback = namespace.bind(callback);
[/javascript]

After you’ve added a transaction id, you can follow the same pattern to add a user id, request path, or anything else to your logs. For long stack traces, check out the trace module which uses the AsyncListener API to keep track of where an error came from.

At DataHero, adding the transaction id to our log messages has made our logs easier to parse and has helped us debug several issues by showing when multiple requests were happening.

Do you have any questions about continuation-local-storage or how to use it? Leave a comment and we’ll continue the discussion.
 

If you’d like to learn more about developing for data, check out our meetup.

 

By Islam Sharabash

Create my Free DataHero Account

Get the fastest, easiest way to understand your data today.