Performance Implications When Comparing Types in Node.js

Like in any language that is weakly typed, you can’t avoid the fact that performing comparisons across types will cost you CPU cycles.

Consider the following code which does a .filter on an array of 5M entries, all of which are Numbers:

let arrOfNumbers = Array(5000000).fill(1);
console.time('eqeq-number')
arrOfNumbers.filter(a => a == 1)
console.timeEnd('eqeq-number')
console.time('eqeqeq-number')
arrOfNumbers.filter(a => a === 1)
console.timeEnd('eqeqeq-number')

On my Mac, they’re roughly equivalent, with a marginal difference in the performance in the eqeq and eqeqeq case:

eqeq-number: 219.409ms
eqeqeq-number: 225.197ms

I would have assumed that the eqeqeq would have been faster given there’s no possibility of data type coercion, but it’s possible the VM knew everything was a number in the array and the test value, so, meh, about the same.

Now, for the worst case scenario, consider this following code: the same .filter, but the array is now full of 5M strings of the value “1”:

let arrOfStrings = Array(5000000).fill('1');
console.time('eqeq-string')
arrOfStrings.filter(a => a == 1)
console.timeEnd('eqeq-string')
console.time('eqeqeq-string')
arrOfStrings.filter(a => a === 1)
console.timeEnd('eqeqeq-string')

The eqeq costs about the same as the original example with the weakly typed Number to Number comparison, but now the eqeqeq is significantly faster:

eqeq-string: 258.572ms
eqeqeq-string: 72.275ms

In this case it’s clear to see that the eqeqeq case doesn’t have to do any data coercion since the types don’t match, the evaluation is automatically false without having to muck the String to a Number. If you were to continue to mess around and have the .filters compare eqeq and eqeqeq to a String ‘1’ the results again are the same as the first few tests.

Conclusion? Same the VM work if you can. This is a really obtuse example as the eqeqeq can quickly shortcut the comparison to “false” since the types don’t match, but anywhere you can save effort when working on large data sets, it’s helpful to do so, and typing is an easy win when you can take it.

Optimizing Array Lookups in Node.js

Following up on last week’s post, one of the areas we see (saw!) our integration service running CPU hot was when it was doing the core part of it what it does: diffing the list of data we receive from an EHR integration with our own knowledge of the data (aka a sync process). When the data set was in the 1000s of records, the diff calculations were effectively a couple of milliseconds, but as the data sets reached 10k+ records, we often saw in production that the diffs could take over 50/60+ seconds.

Our original implementation of this diff algorithm was pretty simple. It took the inbound list and did an Array filter against one list, and then an Array find on the other to see if there were matches. Here’s a snippet of the code:

const onlyInInbound = inboundList.filter(currentInbound => {
	return lumaList.filter(currentLuma => {
		return currentLuma.externalId.value == currentInbound.id;
	}).length === 0;
});

The operation was basically O(n*m). In one customer’s account, that implementation ran on average of 54,844ms to run. Not good. In synthetic tests we’d see the function run faster over time as the JIT caught up to the work but it was pathetically slow.

Our first pass at optimizing this was to use a technique similar to fast.js‘s array methods, which is to not use the built in Array functional operators and switch to more vanilla for loops. From reading a bunch, the built in iterators have to detail with things like spare arrays so you end up spending a lot of type in edge case checking. We know for sure what the input data sets look like, so we eventually moved to an implementation that looked like this:

function filter (subject, fn) {
	let result = [];
	for (var i = 0; i < subject.length; i++) {
		if (fn(subject[i])) {
			result.push(subject[i]);
		}
	}
	return result;
}

const onlyInInbound = filter(inboundList, currentInbound => {
	return filter(lumaList, currentLuma => {
		return currentLuma.externalId.value == currentInbound.id;
	}).length === 0;
});

This implementation was much much faster, and brought the operation in that same customer account down to 20,316ms on average. Not amazing by any stretch, but far faster than before. As we kept writing synthetic tests, one of the big things we noticed was the JIT wasn’t able to fully lower these functions if the comparisons weren’t on the same data type. If the comparisons were mixed presentations of the same value (e.g.. compare ‘1’ to 1), we’d get no JIT benefit (on Node 10). Unfortunately, due to the dirty nature of the data we ingest from all the EHRs we integrate with, we have to assume a level of variable typing in our data pipeline, so the JIT could only save us so much.

The last and final implementation we made (which is what is running in production now) was to do the classic tradeoff of memory versus CPU. The final implementation iterated through both lists and converted them to objects so we could do direct lookups instead of iterations of the data. Here’s a snippit of the final implmentation:

const newInboundList = {};
for (var i = 0; i < inboundList.length; i++){
	newInboundList[inboundList[i].id] = inboundList[i];
}
const newLumaList = {};
for (var i = 0; i < lumaList.length; i++){
	newLumaList[lumaList[i].externalId.value] = lumaList[i];
}
const onlyInInbound = [];

for(const inbound in newInboundList) {
	if (!newLumaList[inbound]) {
		onlyInInbound.push(newInboundList[inbound]);
	}
}

As you can see, we trade a little bit of time to do the setup (by creating a two object based representations of the data) and then do an O(n) iteration through the list of comparison data. And viola! The final implementation went to 72.5ms, a 761x improvement over the original implementation.

Monitoring the Node.js Event Loop with InfluxDB

One of our services (our integration engine) at Luma Health has recently been encountering odd timeouts when making outbound connections to another service it depends on. The receiving service has plenty of resources to spare, so we’ve been working through the theory that the event loop in Node might be starved before the callbacks and timers loops cycles are able to be hit.

To test this, we’ve been playing with monitoring timer performance putting the data in to InfluxDB in order to aggregate and monitor it. To do that, we simply set up a setInterval and use a high resolution timer to watch the results and write the delta of when we expected to be called versus when the interval was actually called.

const Influx = require('influx');
// snapshot the package's name
const packageName = require(process.cwd() + '/package.json').name;
const measurement = 'event_loop_interval_delay';
const fs = require('fs');
const influx = new Influx.InfluxDB(process.env.INFLUXDB);
const { exec } = require('child_process');

let serviceVersion = null;

// snap out the gitsha
exec('git rev-parse HEAD', (err, version) => {
	serviceVersion = version.toString().trim();
});

// and the docker container ID
const hostname = fs.existsSync('/etc/hostname') ?
	fs.readFileSync('/etc/hostname').toString().trim() :
	'localhost';

let startAt = process.hrtime();
const intervalDelay = 500;

// set up an interval to run every 500ms
setInterval(() => {
	const calledAt = process.hrtime(startAt);
	const nanoseconds = calledAt[0] * 1e9 + calledAt[1];
	const milliseconds = nanoseconds / 1e6;
	influx
		.writePoints([{
			measurement,
			tags: {
				service: packageName,
				serviceVersion,
				hostname
			},
			fields: {
				delayTime: (milliseconds - intervalDelay).toFixed(4)
			},
		}])
		.then(() => {})
		.catch(() => {});
	startAt = process.hrtime();
}, intervalDelay);

I thought it’d be fun to share how we’re using Influx to monitor Node internals. We’ve been monitoring the data and generally seeing Node able to keep up but there are times when the integration engine is under high load and the intervals come anywhere from 500ms to multiple seconds (!!!) late.

Swift: println isn’t NSLog

After banging my head against this the last few days, I thought I might share a little insight as I delve deeper into Swift. As tempting as it may sound to believe it, println IS NOT NSLog.

If you’ve looking to use the Devices function in the iOS Simulator to test an application launching via a push notification, or perhaps via a location update, you’re stuck trying to race to connect the debugger or better yet rely on printed statements and watching them.

The magic trick here is to remember that println does not show up in the iOS Simulator Console, but NSLog does. Further NSLog also works on device so can play with app while disconnected from your dev machine, then plug it back in and pull the logs via the Devices tool in Xcode.

Some additional details are here, which I sadly found far after figuring this out myself.

Loading Shapefiles in to MongoDB

I’ve been playing a bit recently with a small geospatial/location based app. After haggling with a bunch of tools and MongoDB a bunch, here are a few tips on importing a set of ESRI Shapefiles in to a MongoDB. I’m looking at SF Street Sweeping data but you can use any Shapefiles you wish.

Get the shapefiles

For example, grab the SF Street Sweeping data, and download and unzip those.

Convert Shapefiles to WGS84 Projection

The SF shapefiles are in Northern California specific projection (2227). Lat/long coordinates are in WGS84 projection (4326). Download the GDAL tools to get access to the ogr2ogr tool to directly convert them. Or use QGIS to load the shapefile as a vector, then export it out using the WGS84 projection.

Convert Shapefile to GeoJSON

ogr2ogr -f geoJSON sweeping.json sfsweeproutes_in_wgs84.shp

Clean up the resulting GeoJSON

MongoImport doesn’t like the ogr2ogr generated GeoJSON. Remove the first two lines:

{ "type": "FeatureCollection",

and the last line:

}

and save that to `sweeping_clean.json`

Import the data to Mongo

mongoimport --db sfstreets --collection streets < sweeping_clean.json

Create a 2dsphere spatial index

Mongo needs an index to query on geospatial data. To create it from the mongo command line:

db.streets.ensureIndex({"geometry":"2dsphere"})

Where streets is your collection name and `geometry` is the object in your document that contains the GeoJSON location data.

That’s it!
You now have in the sfstreets database a streets streets collection. I’ll follow up in the next post on how to query this data.<

Static Site Hosting on Heroku with Node.js

I’ve been moving a lot of my web content off of a personal server which has been kept in to my apartment to various hosting services while on break this year. Sites like Ask An Asian Person and other small inside jokes I used to host on a Windows 2003 Server with IIS on a Dell machine that ran in my closet. That setup is/was so very, well, 2003. In addition, it’s always a good move to reduce and remove any ingress points to my home network.

So for a bunch of the silly small sites I have, I’ve moved them over to one-dyno free hosting on Heroku. To do that, I made up a little template to use called static-heroku-node. It’s a tiny 10 line Node.js + Express application that deploys applications out of the /public/ folder in the app. Quick and easy to use, I managed to move a few sites over in short order.

As an aside, I moved my blog over to DreamHost. I looked at Heroku for hosting WordPress — there are a bunch of options on how to do it, but any production setup (e.g. > 1 dyno and any of their production level Postgres databases) would cost something like $25-$50 per month which is a bit rich for just a blog. DreamHost’s 1-click WordPress setup is much cheaper and more flexible than trying to scaffold the same thing up on Heroku/Dotcloud/etc.

Getting Maven to Work with Android 17

I just suffered through this for a good part of an afternoon, so figured I’d blog it for anybody else. Apparently a bunch of stuff moved around between some versions of Android, so make soft links to:
./sdk/tools/aapt to ./sdk/build-tools/17.0.0/aapt ./sdk/tools/dx to ./sdk/build-tools/17.0.0/dx ./sdk/tools/lib/dx.jar to ./sdk/build-tools/17.0.0/lib/dx.jar

So, if you’re in your ./sdk/tools directory, you can:
ln -s aapt ../build-tools/17.0.0/aapt ln -s dx ../sdk/build-tools/17.0.0/dx ln -s lib/dx.jar ../build-tools/17.0.0/lib/dx.jar

iOS6 Frustrations

I upgraded to iOS 6 yesterday and so far have been far from impressed. Here’s what’s irking me:

Passbook is unusable. I installed the iOS 6 compatible version of the United Airlines app, which clearly states Passbook support and yet Passbook only shows me the splash screen. On top of that, the App Store shows the Cannot Connect to iTunes error. Fixes aside, even with the United app I can’t get it to work.
Turn by turn navigation in Maps is rubbish. I had to drop off some stuff across the city last night, so I thought from my house to Embarcadero I’d use Maps to get me directions. It took me via the freeway which was odd to start, but as I drove city streets at like 15/20mph, it kept getting my location wrong and subsequently rerouting me to different routes. On top of that, if often jumped the gun and gave me the next turn before I had arrived at the current turn.
I’m not a fan of the new multi-colored status bar. It updates with all the apps and the Mail and System app seem jarring with the new colors. This one is just personal preference though, nothing functional.
I miss the YouTube app. I downloaded the Google one and it’s just not the same.

Okay, those are only four complaints, but the Maps one is really exceptionally frustrating.

Sencha Architect 2

The very first thing my boss at Sencha asked me to work on when I joined was: “figure out what we’re doing with Designer”. Having known nothing then and only a skeleton team we started to sketch out what would become Sencha Architect 2. That was a year ago, and thankfully we’ve been lucky enough to be joined at Sencha by a great team from engineering, to product management, to UX that made Sencha Architect 2 happen. It’s crazy to look back to see in the last year all the amazing products we’ve released, such as Sencha Touch 2, the preview of Sencha.io, and now Architect 2. It’s such a huge step forward for us and for the web. I’ve been on the road demoing it at conferences and people go crazy when you see how easy it is to build an HTML5 app, which is a testament the work the team has put in to make the product easy and productive to use.

Thoughts on SenchaCon 2011

I’m back at home after an exciting time at SenchaCon 2011, in Austin, Texas. What an event. A great bonding experience for the company, and an even better opportunity for us to spend time with customers and the community. I don’t blog often enough about Sencha, but this was so much fun and so exciting that I had to toss out a few things that were my favorite from the show:

The Sencha Platform: we articulated our platform for the web using Sencha technologies. Built on tools, frameworks and the cloud the Platform gives developers all they need to get up and running building world class web apps. So stoked about building out the Platform over the next year.
Sencha.io, Ext JS 4.1, Sencha Touch 2, and Designer 2. Awesome launches for all of our products. Congrats to the teams for making it happen.
Multi-device, Shared, and Enriched: our vision on how the future is going to shape web apps and content in general. Abe did a fantastic job articulating how the web will evolve over the next few years.
BBQ at Stubbs! So much fun hanging out and have a drink with folks from the community, and an awesome live performance by Black Joe Lewis and the Honeybears. They’ll be at the Fillmore next month if you want to catch them in San Francisco.
Hackaton! I’m always amazed at what people can put together at a hackathon, and I’ll brag just a bit: the teams at our hackaton produced some of the best apps I’ve ever seen at a one-day hackathon. Hats off to the community for kicking ass hacking on our products.

SenchaCon was easily one of the best tech conferences I’ve been to, and I’m proud to have been a part of it. Now, time to sleep for the next few days. Can’t wait for SourceDevCon in London next year — hope to see you there if you didn’t make it to SenchaCon Austin.