How We Cut VIN Decode p95 Latency by 70%

At Openlane, VIN decoding sits on the hot path of every vehicle intake. Every inspection, every inventory record, every workflow event that involves a vehicle starts with a VIN decode. When that path is slow, everything downstream feels it.

Our p95 was sitting at a number that made the team uncomfortable. The instinct — as it almost always is — was to look at the database first. That instinct was wrong. Here's what actually happened.

Start with measurement, not guesses

Before touching anything, I instrumented the decode path to emit timing spans for every external call: the Redis cache check, the vendor API call, and the Postgres reference-data lookup. I let it run for 24 hours and looked at the breakdown for the slow requests.

The result was clarifying. For a slow decode request:

Redis check: ~2ms (cache miss)
Vendor API call: the vast majority of elapsed time
Postgres: fast, never the bottleneck

The database was fine. It had been fine the whole time. The bottleneck was that on cache misses, we were making a synchronous call to an external vendor API — and inheriting that vendor's tail latency directly into our p95.

This is the part I want to emphasize: two weeks of index optimization on Postgres would have moved our p95 by zero milliseconds. The discipline of measuring before doing is the difference between useful work and confident motion in the wrong direction.

The fix: layered caching with intentional TTLs

Vehicle specification data is largely immutable. A VIN decoded today returns the same make, model, trim, and engine configuration it returned six months ago. That property makes it an ideal caching target — the main design question is TTL strategy, not cache invalidation complexity.

I designed two cache tiers:

Redis with TTL-based invalidation. The primary cache layer. VIN decode results are stored keyed by VIN. TTLs are differentiated by data type: core spec fields (make, model, year, trim) get long TTLs since they never change post-manufacture; reference data that can update (recall status, title flags) gets shorter TTLs. This keeps the cache useful while bounding stale-data risk.

Request coalescing for concurrent misses. Under load, the same VIN can arrive in multiple concurrent requests before the first one populates the cache. Without coalescing, each of those requests makes a separate vendor call. With it, the first request owns the vendor call and the rest wait on its result. This flattened the spike behavior we saw during batch intake events.

Rolling it out one change at a time

I deployed each layer independently and measured between each step. Stacking changes and then measuring is how you end up arguing about which change actually helped.

Redis caching alone: large p95 improvement from eliminating most vendor calls
Adding request coalescing: further reduction, primarily visible during peak batch intake
Tuning TTLs based on observed hit-rate data: incremental gains

End result: p95 improved by 70%, vendor API call volume dropped substantially, and the decode path stopped being the thing that showed up in every latency conversation.

The broader lesson

Every latency problem I've worked on has followed the same shape: the team has a consensus about where the bottleneck is, the consensus is wrong, and measurement reveals something more specific and more fixable than the assumed root cause.

Profile first. Form hypotheses after the data, not before. And when you find the real problem, fix it one change at a time — because that's the only way to know which change actually worked.