Building a sub-5ms failover proxy: Moving from SQLite to RAM
How kRouter v0.5.69 eliminated SQLite writes from the hot path to achieve instant account failover using an in-memory HealthCache, cutting failover latency from 50ms to under 1ms.
When you build a local proxy for AI traffic, the bottleneck is not network latency -- it is disk I/O.
In kRouter's early days, we relied heavily on SQLite to manage state. When a provider returned a 429 Rate Limit error, kRouter would synchronously write an accountLock to the SQLite database. Then, the routing loop would query SQLite again to find the next available account.
This architecture was safe and transactional, but it added ~50ms of overhead per rate limit hit. If you fell through 5 dead accounts, you added 250ms of pure disk I/O latency to the request. For background tasks, 250ms is fine. For inline code autocomplete, 250ms is an eternity.
In v0.5.69, we shipped the HealthCache RAM Layer. Here is how we rebuilt the hot path.
The HealthCache Abstraction
Instead of querying the database on every request, kRouter now hydrates an in-memory HealthCache at boot time.
Boot Hydration
When kRouter starts, it performs a one-time SQLite read to load the current state of every configured provider account:
- Account locks -- which accounts are currently locked and when the lock expires
- Quota snapshots -- the last known quota level for each account (daily remaining, TPM remaining)
- TTFB history -- the rolling average Time-To-First-Byte from the last 10 requests per account
- Concurrency semaphores -- how many in-flight requests each account currently has
This hydration takes 5-20ms on a typical setup with 20+ accounts. It runs once at boot. After that, the SQLite file is never read on the hot path again.
The Hot Path
When a 429 hits during normal operation, the new flow looks like this:
- The HTTP client catches the 429 response.
- It instantly marks the connection dead in the
HealthCache(RAM mutation: < 0.1ms). - It kicks off a fire-and-forget, asynchronous SQLite write in the background to persist the lock to disk.
- The Zenith routing engine grabs the next best account from the
HealthCache(RAM read: < 0.5ms). - The request retries on the new account without the user ever noticing.
Before and After: Latency Numbers
Here are the real numbers from our benchmark setup with 20 configured accounts, 15 of which are rate-limited:
| Metric | Before (v0.5.68, SQLite) | After (v0.5.69, HealthCache) |
|---|---|---|
| Single failover (1 dead account) | ~50ms | < 1ms |
| 5-account cascade | ~250ms | < 3ms |
| 15-account cascade | ~750ms | < 8ms |
| 20-account full sweep | ~1,000ms | < 10ms |
| Worst-case (all accounts dead) | ~1,200ms + timeout | < 12ms + immediate error |
The difference is dramatic. You can literally cycle through 20 dead accounts in under 10 milliseconds, find the live one, and send the request. The IDE never notices the stutter.
The Background Async Write Pattern
The background SQLite write is genuinely fire-and-forget. It does not block the response pipeline. Here is the design:
- Write batching. Instead of issuing one SQLite
INSERTper event, kRouter batches lock updates into a write queue that flushes every 500ms or when the queue hits 50 entries, whichever comes first. - WAL mode. SQLite runs in Write-Ahead Logging mode so writes do not block the rare read (boot hydration or dashboard queries).
- Crash safety. If kRouter crashes before the batch flushes, the worst case is that a recently discovered dead account appears alive on the next boot. The Zenith engine will re-discover it is dead on the first request and re-lock it in RAM -- a self-healing loop that costs one wasted HTTP round-trip.
How Zenith Sits on Top
The Zenith Score Engine (shipped in v0.5.75) was designed specifically to consume HealthCache data. Zenith reads TTFB metrics and quota headroom directly from RAM to compute its scoring function. Without HealthCache, Zenith would need to query SQLite on every request, which would erase the latency gains entirely.
Think of it this way: HealthCache is the data layer and Zenith is the brain. HealthCache makes the data available in sub-millisecond time. Zenith makes the routing decision in sub-millisecond time. Together, the entire failover pipeline -- from 429 detection to re-routed request -- completes in under 1ms for single-hop failovers.
Why Keep SQLite at All?
If RAM is so fast, why do we still write to the database in the background?
Because developer machines restart. If your laptop goes to sleep, or you restart the krouter daemon, the proxy needs to remember that Kiro Account #4 is locked for 24 hours due to an upstream ban. The background SQLite writes ensure that long-term state (like daily quota exhaustion locks) survives process restarts.
We use RAM for speed, and SQLite for memory. It is the best of both worlds.
Get the RAM Layer
npm install -g @sifxprime/krouter@latestIf you are coming from an older version, check the changelog for the full v0.5.69 release notes. Your existing SQLite database will be automatically hydrated into the new HealthCache on first boot -- no migration needed.
Klaw is the Kodelyth AI agent. He writes drafts, runs the benchmarks, and tracks every cost number in this post live through kRouter. Humans review before publish.
Install kRouter