Adventures in Async Calls and bcrypt

We use bcrypt to store passwords in Luma Health. In the NodeJS world there are two common libraries used to do this, bcryptjs, which is a pure JavaScript implementation and bcrypt, which is a wrapper on a C++ library.

We had originally used the pure JS version since it was helpful when we upgraded NodeJS versions that linked libraries didn’t have different binary versions, which caused long recompile and npm install times.

Our original implementation (mostly out of laziness) used the synchronous version of the JS library. Since bcrypt is a cost-of-compute type of algorithm, any time we came under high login login (e.g. the start of the day), we started to run CPU hot and then start to time connections as the cost of doing password hashes started to starve out other work happening on our REST servers.

We diagnosed this using Clinic and generating flamegraphs of the system under standard load patterns and it became clear that the bcrypt work was sucking up all the oxygen in the NodeJS process.

Our first fix was to just move to the async methods of the bcryptjs library, which would mean we wouldn’t block the main event loop doing bcrypt’s hashes. Unforutnatly, this didn’t lead to as much of a performance imprvoement as we were hoping to get becuase after digging in to the implementation of bcryptjs (<3 open source), it turns out the main difference between the async version and sync version is that the library would do one round of blowfish per callback. It was definitly an improvement but it still consumed a lot of time within the main NodeJS process.

We then looked at moving to the C++ based module and again it had two different functions, one sync and another async. The sync one would run the hashing functions in C++, which is faster, but even better, the async version would run the hashing function in an nan::AsyncWorker.

Result? About a 30% increase in response times through our front ends a lot smoother load management under high load. Moral of the story? Always use async even when it’s tempting / easy / lazy to use a sync version of a function in NodeJS. The graph from the very beginning shows the improvements right after we deployed the new code to production.

Performance Implications When Comparing Types in Node.js

Like in any language that is weakly typed, you can’t avoid the fact that performing comparisons across types will cost you CPU cycles.

Consider the following code which does a .filter on an array of 5M entries, all of which are Numbers:

let arrOfNumbers = Array(5000000).fill(1);
console.time('eqeq-number')
arrOfNumbers.filter(a => a == 1)
console.timeEnd('eqeq-number')
console.time('eqeqeq-number')
arrOfNumbers.filter(a => a === 1)
console.timeEnd('eqeqeq-number')

On my Mac, they’re roughly equivalent, with a marginal difference in the performance in the eqeq and eqeqeq case:

eqeq-number: 219.409ms
eqeqeq-number: 225.197ms

I would have assumed that the eqeqeq would have been faster given there’s no possibility of data type coercion, but it’s possible the VM knew everything was a number in the array and the test value, so, meh, about the same.

Now, for the worst case scenario, consider this following code: the same .filter, but the array is now full of 5M strings of the value “1”:

let arrOfStrings = Array(5000000).fill('1');
console.time('eqeq-string')
arrOfStrings.filter(a => a == 1)
console.timeEnd('eqeq-string')
console.time('eqeqeq-string')
arrOfStrings.filter(a => a === 1)
console.timeEnd('eqeqeq-string')

The eqeq costs about the same as the original example with the weakly typed Number to Number comparison, but now the eqeqeq is significantly faster:

eqeq-string: 258.572ms
eqeqeq-string: 72.275ms

In this case it’s clear to see that the eqeqeq case doesn’t have to do any data coercion since the types don’t match, the evaluation is automatically false without having to muck the String to a Number. If you were to continue to mess around and have the .filters compare eqeq and eqeqeq to a String ‘1’ the results again are the same as the first few tests.

Conclusion? Same the VM work if you can. This is a really obtuse example as the eqeqeq can quickly shortcut the comparison to “false” since the types don’t match, but anywhere you can save effort when working on large data sets, it’s helpful to do so, and typing is an easy win when you can take it.

Optimizing Array Lookups in Node.js

Following up on last week’s post, one of the areas we see (saw!) our integration service running CPU hot was when it was doing the core part of it what it does: diffing the list of data we receive from an EHR integration with our own knowledge of the data (aka a sync process). When the data set was in the 1000s of records, the diff calculations were effectively a couple of milliseconds, but as the data sets reached 10k+ records, we often saw in production that the diffs could take over 50/60+ seconds.

Our original implementation of this diff algorithm was pretty simple. It took the inbound list and did an Array filter against one list, and then an Array find on the other to see if there were matches. Here’s a snippet of the code:

const onlyInInbound = inboundList.filter(currentInbound => {
	return lumaList.filter(currentLuma => {
		return currentLuma.externalId.value == currentInbound.id;
	}).length === 0;
});

The operation was basically O(n*m). In one customer’s account, that implementation ran on average of 54,844ms to run. Not good. In synthetic tests we’d see the function run faster over time as the JIT caught up to the work but it was pathetically slow.

Our first pass at optimizing this was to use a technique similar to fast.js‘s array methods, which is to not use the built in Array functional operators and switch to more vanilla for loops. From reading a bunch, the built in iterators have to detail with things like spare arrays so you end up spending a lot of type in edge case checking. We know for sure what the input data sets look like, so we eventually moved to an implementation that looked like this:

function filter (subject, fn) {
	let result = [];
	for (var i = 0; i < subject.length; i++) {
		if (fn(subject[i])) {
			result.push(subject[i]);
		}
	}
	return result;
}

const onlyInInbound = filter(inboundList, currentInbound => {
	return filter(lumaList, currentLuma => {
		return currentLuma.externalId.value == currentInbound.id;
	}).length === 0;
});

This implementation was much much faster, and brought the operation in that same customer account down to 20,316ms on average. Not amazing by any stretch, but far faster than before. As we kept writing synthetic tests, one of the big things we noticed was the JIT wasn’t able to fully lower these functions if the comparisons weren’t on the same data type. If the comparisons were mixed presentations of the same value (e.g.. compare ‘1’ to 1), we’d get no JIT benefit (on Node 10). Unfortunately, due to the dirty nature of the data we ingest from all the EHRs we integrate with, we have to assume a level of variable typing in our data pipeline, so the JIT could only save us so much.

The last and final implementation we made (which is what is running in production now) was to do the classic tradeoff of memory versus CPU. The final implementation iterated through both lists and converted them to objects so we could do direct lookups instead of iterations of the data. Here’s a snippit of the final implmentation:

const newInboundList = {};
for (var i = 0; i < inboundList.length; i++){
	newInboundList[inboundList[i].id] = inboundList[i];
}
const newLumaList = {};
for (var i = 0; i < lumaList.length; i++){
	newLumaList[lumaList[i].externalId.value] = lumaList[i];
}
const onlyInInbound = [];

for(const inbound in newInboundList) {
	if (!newLumaList[inbound]) {
		onlyInInbound.push(newInboundList[inbound]);
	}
}

As you can see, we trade a little bit of time to do the setup (by creating a two object based representations of the data) and then do an O(n) iteration through the list of comparison data. And viola! The final implementation went to 72.5ms, a 761x improvement over the original implementation.

Monitoring the Node.js Event Loop with InfluxDB

One of our services (our integration engine) at Luma Health has recently been encountering odd timeouts when making outbound connections to another service it depends on. The receiving service has plenty of resources to spare, so we’ve been working through the theory that the event loop in Node might be starved before the callbacks and timers loops cycles are able to be hit.

To test this, we’ve been playing with monitoring timer performance putting the data in to InfluxDB in order to aggregate and monitor it. To do that, we simply set up a setInterval and use a high resolution timer to watch the results and write the delta of when we expected to be called versus when the interval was actually called.

const Influx = require('influx');
// snapshot the package's name
const packageName = require(process.cwd() + '/package.json').name;
const measurement = 'event_loop_interval_delay';
const fs = require('fs');
const influx = new Influx.InfluxDB(process.env.INFLUXDB);
const { exec } = require('child_process');

let serviceVersion = null;

// snap out the gitsha
exec('git rev-parse HEAD', (err, version) => {
	serviceVersion = version.toString().trim();
});

// and the docker container ID
const hostname = fs.existsSync('/etc/hostname') ?
	fs.readFileSync('/etc/hostname').toString().trim() :
	'localhost';

let startAt = process.hrtime();
const intervalDelay = 500;

// set up an interval to run every 500ms
setInterval(() => {
	const calledAt = process.hrtime(startAt);
	const nanoseconds = calledAt[0] * 1e9 + calledAt[1];
	const milliseconds = nanoseconds / 1e6;
	influx
		.writePoints([{
			measurement,
			tags: {
				service: packageName,
				serviceVersion,
				hostname
			},
			fields: {
				delayTime: (milliseconds - intervalDelay).toFixed(4)
			},
		}])
		.then(() => {})
		.catch(() => {});
	startAt = process.hrtime();
}, intervalDelay);

I thought it’d be fun to share how we’re using Influx to monitor Node internals. We’ve been monitoring the data and generally seeing Node able to keep up but there are times when the integration engine is under high load and the intervals come anywhere from 500ms to multiple seconds (!!!) late.

Static Site Hosting on Heroku with Node.js

I’ve been moving a lot of my web content off of a personal server which has been kept in to my apartment to various hosting services while on break this year. Sites like Ask An Asian Person and other small inside jokes I used to host on a Windows 2003 Server with IIS on a Dell machine that ran in my closet. That setup is/was so very, well, 2003. In addition, it’s always a good move to reduce and remove any ingress points to my home network.

So for a bunch of the silly small sites I have, I’ve moved them over to one-dyno free hosting on Heroku. To do that, I made up a little template to use called static-heroku-node. It’s a tiny 10 line Node.js + Express application that deploys applications out of the /public/ folder in the app. Quick and easy to use, I managed to move a few sites over in short order.

As an aside, I moved my blog over to DreamHost. I looked at Heroku for hosting WordPress — there are a bunch of options on how to do it, but any production setup (e.g. > 1 dyno and any of their production level Postgres databases) would cost something like $25-$50 per month which is a bit rich for just a blog. DreamHost’s 1-click WordPress setup is much cheaper and more flexible than trying to scaffold the same thing up on Heroku/Dotcloud/etc.

Introducing Quick Group Chat

I’ve been working on a litte side project the last few weeks called Quick Group Chat. The site/service came from some IM conversations I was having with friends. I was talking to one friend on Gtalk and the other was on AIM and we were trying to plan a trip. There was no way for us to get in to a ‘quick group chat’ that let us ad hoc talk with each other.

I’d been looking for an excuse to play with nodejs and jquery, so I forked the node_chat example app. Since I started with the node_chat demo, I had a good structure to get started with. The app is basically built out of four files: server.js, client.js, style.css and index.html. To start off I added in a layer called ‘rooms’ in to the server that encapsulated a session, which represents a user. When a user joins, if they aren’t trying to join a room or if the server can’t find a room they were trying to join, a new room is created and an ID is given to it. The user’s session is assigned to that room and the chat buffer is kept for that room. The ID is passed back to the client, and the client updates the location.hash with the room’s ID.

From then anybody who has the URL with the location.hash can join the room. When the user navigates to that URL, the client passes the room ID to the server, and the server joins the user to the room. Since the chat buffer is kept for the room, every time a user joins the room they get to see the prior chat log. When the last user leaves a room, the room is destroyed along with all the data. Hence, quick group chat. Once you’re in the room, the URL is they to invite other people in. It’s all pretty simple.

I also added in some styling to make it little more fun and easier to read. Each user is assigned their own color based on a color wheel. The formula is kind of fun and easy. For each user’s nickname the code goes over the nick and sums up the ASCII code for each letter in the name, then mod’s it against the total number of colors in the color wheel then assigns a CSS class for that color to the message. That way every time “Aditya” logs in, the user’s color is the same. There’s one exception, and that’s the “me” user. Your typing always looks the same always; thus, the way other people see “Aditya” is always consistent and they see a consistent color for you. I may change that later but it makes other people’s chat pop and your own words a bit more subdued.

Lastly, I hosted it on duostack. This was one of the coolest parts. Once I set up some SSH keys, I added duostack as a remote to my git. Every time I do a “git push duostack”, their remote repository is updated and the app recycles. It’s crazy easy to deploy to their service and it runs really well there. I’ve been amazed at how easy it is and really happy with the service.

Check it out, use it yourself, or fork it and let me know how it goes!