Unbounded Range Queries in Mongo

In Luma Health, often times we have to do queries against collections that have 50, 60, 100M+ records — as you’d expect, well thought through queries and good indexes are the building blocks to querying these types of collections.

In today’s example, we had a collections in production that contains a DATE field where we have to do range queries (e.g. DATE > something, DATE < something). We started to notice a large number of the underlying API calls that hit that table were getting logged to our slow execution monitoring. Specifically in one service, we started seeing about 500 slow logs per 5 minute interval.

We spent some time looking through the Mongo query planner, digging in to the DB queries the API calls were making and and few found a few examples like this:

    date: { $lte: endDate },
    endDate: { $gte: startDate }

Now both the date and endDate fields were in a proper compound index that the Mongo query planner was using, but when looking through the execution stats, the query planner was canonicalizing each end of the ranges as date: {$lte: endDate, $gte: Infinity }. Yikes! All the hard word in indexing, query design, etc, went out the window — when the query executed, Mongo had to pull two entire ranges and then intersect them in memory rather than through the index.

We quickly fixed the queries and as you’d expect, much happier production monitoring. In the graph below you see around 6am the daily load starts to pick up and the the slow logs pick up in frequency. We deployed the change about 915am and like magic, the slow logs go back down to zero.

Moral of the story: unbounded range queries can lead to very unintended performance consequences.

286 Words

Performance Implications When Comparing Types in Node.js

Like in any language that is weakly typed, you can’t avoid the fact that performing comparisons across types will cost you CPU cycles.

Consider the following code which does a .filter on an array of 5M entries, all of which are Numbers:

let arrOfNumbers = Array(5000000).fill(1);
console.time('eqeq-number')
arrOfNumbers.filter(a => a == 1)
console.timeEnd('eqeq-number')
console.time('eqeqeq-number')
arrOfNumbers.filter(a => a === 1)
console.timeEnd('eqeqeq-number')

On my Mac, they’re roughly equivalent, with a marginal difference in the performance in the eqeq and eqeqeq case:

eqeq-number: 219.409ms
eqeqeq-number: 225.197ms

I would have assumed that the eqeqeq would have been faster given there’s no possibility of data type coercion, but it’s possible the VM knew everything was a numberĀ  in the array and the test value, so, meh, about the same.

Now, for the worst case scenario, consider this following code: the same .filter, but the array is now full of 5M strings of the value “1”:

let arrOfStrings = Array(5000000).fill('1');
console.time('eqeq-string')
arrOfStrings.filter(a => a == 1)
console.timeEnd('eqeq-string')
console.time('eqeqeq-string')
arrOfStrings.filter(a => a === 1)
console.timeEnd('eqeqeq-string')

The eqeq costs about the same as the original example with the weakly typed Number to Number comparison, but now the eqeqeq is significantly faster:

eqeq-string: 258.572ms
eqeqeq-string: 72.275ms

In this case it’s clear to see that the eqeqeq case doesn’t have to do any data coercion since the types don’t match, the evaluation is automatically false without having to muck the String to a Number. If you were to continue to mess around and have the .filters compare eqeq and eqeqeq to a String ‘1’ the results again are the same as the first few tests.

Conclusion? Same the VM work if you can. This is a really obtuse example as the eqeqeq can quickly shortcut the comparison to “false” since the types don’t match, but anywhere you can save effort when working on large data sets, it’s helpful to do so, and typing is an easy win when you can take it.

340 Words