Introducing kRouter — a hardened AI router by Kodelyth
kRouter is an MIT-licensed, self-hosted AI router that connects 88+ providers behind one local OpenAI-compatible endpoint. Zenith Score Engine, HealthCache RAM Layer, RTK compression, and 3-tier auto-fallback — here is why we built it and what it changes for you.
The AI coding stack got really weird in the last twelve months. Every IDE wants a subscription. Every model provider wants a separate key. Every "agentic" workflow somehow ends up eating tokens twice for the same task. We watched our team's monthly LLM bill cross four figures, and almost none of it bought us more code shipped.
So we built kRouter -- an MIT-licensed, self-hosted router that sits between your IDE and every provider you already pay for. One local endpoint, every model, predictable cost.
What kRouter does in one paragraph
You install kRouter on your laptop. It exposes an OpenAI-compatible API at localhost:20128. You point Claude Code, Cursor, Antigravity, Copilot, Codex, Kiro -- any client -- at that endpoint. Behind the scenes, kRouter routes each request to the provider that costs the least: your subscription quota first, then a cheap pay-per-token tier, then a free fallback. When one runs out, it transparently switches to the next. Your IDE never sees an error. Your bill never surprises you.
88+ providers across five auth tiers
kRouter connects to providers across subscription, API-key, free-credit, free, and browser-cookie auth tiers:
- Subscription tier: Claude Code, Codex CLI, Gemini CLI (180K tokens/month free), GitHub Copilot, Antigravity -- you already pay for these, so kRouter burns their quota first.
- Cheap tier: GLM-4.7 at $0.60/1M tokens, MiniMax M2.1 at $0.20/1M, Kimi K2 at $9/month flat rate -- pennies per million tokens when subscriptions run dry.
- Free tier: iFlow (8 models), Qwen (3 models), Kiro (Claude Sonnet 4.5, Haiku 4.5 -- free) -- the safety net that catches every quota-out event so your IDE never errors.
The exact provider count is 88+ and growing. Each provider is configured with its auth method, rate limits, and model mappings. kRouter manages the full lifecycle.
Why a fork instead of joining 9router?
kRouter is forked from decolua/9router -- credit where it is due. We needed harder edges:
- Antigravity verify-account ban fix. Bootstrap was sending string OAuth enums while the live API expected numeric. Google's anti-abuse layer flagged accounts on first call. We fixed it with byte-exact parity (v0.5.46).
- NGHTTP2 stream recovery. MITM stream errors were surfacing as cryptic
Truncated event messageerrors. We added bounds-checked frame parsing and a clean HTTP/1.1 fallback. - Atomic backoff under concurrency. Cooldown increments raced under high traffic. SQLite transactions fixed it.
- Per-IP brute-force lockout, SSRF guards, timing-safe token compare. All the boring security details that make a router safe to expose over Tailscale.
The full diff is on /compare.
The features that actually save money
Three things change your bill the most:
- RTK Token Saver. Detects tool-output payloads (git diff, ls, tree, grep) and compresses them losslessly before the request leaves your machine. Saves 20-40% input tokens per request, by default, every request.
- Caveman Mode. An optional terse-style prompt injection. Lite trims filler words. Full goes harder. Wenyan strips natural-language prose to telegram-style microcopy. Up to 65% output token savings on factual queries.
- 3-Tier Auto-Fallback. Subscription quota -> cheap pay-per-token -> free tier. The free tier is the safety net that catches every quota-out event so your IDE never errors out.
The dashboard shows you both your real spend and the hypothetical spend you would have had on flat-rate paid APIs. The delta is what kRouter saved you.
Zenith Score Engine (v0.5.75)
Starting with v0.5.75, kRouter uses the Zenith Score Engine to mathematically pre-rank provider accounts before each request. The score combines two live signals:
- TTFB latency -- measured in real time per account. Accounts with faster first-token response get higher scores.
- Quota headroom -- how much capacity remains on this account before it hits a rate limit or daily cap.
The result: kRouter does not just pick the cheapest provider. It picks the cheapest provider that will actually respond fast. Accounts near their quota ceiling get deprioritized automatically, so you never waste a request on a provider that is about to 429 you.
HealthCache RAM Layer (v0.5.69)
Provider health checks used to hit SQLite on every request. That added 3-8ms of latency per routing decision. The HealthCache RAM Layer (v0.5.69) moves health state into an in-memory cache with sub-1ms reads. SQLite is still the persistence layer, but hot-path reads never touch disk. Failover decisions now happen in under 5ms total.
What changed since launch (v0.5.45 -- v0.5.81)
The router has shipped 36 point releases since the initial public version. Key highlights:
| Version | What shipped |
|---|---|
| v0.5.81 | Cloudflare array syntax fix for Workers AI |
| v0.5.80 | Dynamic model fetching for Cloudflare Workers AI -- no more hardcoded model lists |
| v0.5.75-77 | Zenith Score Engine wiring, live TTFB + quota scoring |
| v0.5.74 | Kiro MITM passthrough fix, tool ID sanitization |
| v0.5.69 | HealthCache RAM Layer -- sub-5ms failover |
| v0.5.65 | Kiro IDE first-class support with persona injection |
| v0.5.57 | Preserve thinking intent across Antigravity blacklist |
| v0.5.49 | TPM vs daily quota disambiguation |
| v0.5.47 | Permanent ban flag wiring |
| v0.5.46 | Root-cause fix for Antigravity "verify your account" ban |
The full history is on /changelog.
What you actually get out of the box
- One local OpenAI-compatible endpoint at
localhost:20128/v1 - 88+ providers across OAuth, free, free-credit, API-key, and browser-cookie auth tiers
- Zenith Score Engine for live latency + quota ranking
- HealthCache RAM Layer for sub-1ms failover reads
- Format translation across OpenAI / Claude / Gemini / Cursor / Kiro / Vertex / Antigravity / Ollama / Responses shapes
- RTK Token Saver and Caveman Mode for 20-65% token compression
- Real-time quota dashboard pulled live from each provider's backend
- Audit-logged request history with payload-level debug mode
- 8 pre-built agent skill files for AI coding agents
- TTS, STT, image, embeddings, and search routing -- full multimodal
- MIT licensed. No telemetry. Prompts never leave your machine.
Install in 30 seconds
npm install -g @sifxprime/krouter
krouter -tOpen http://localhost:20128/dashboard, connect a free provider (Kiro is the easiest first step), and point your IDE at the endpoint. That is the entire setup.
For Docker, source builds, and self-hosted VPS deploys, the /install page has the full guide.
Where this is going
kRouter is one piece of a larger thesis: AI coding is going to be cheap because the unit economics of the underlying providers are cheap. The friction is everywhere else -- fragmented APIs, surprise rate limits, vendor lock-in to IDEs. We are building toward a stack where you pick the best model for each request automatically and the bill stays predictable.
Star the repo, install the tool, and send us a PR if something does not work. The roadmap is on /changelog.
-- The Kodelyth team
Klaw is the Kodelyth AI agent. He writes drafts, runs the benchmarks, and tracks every cost number in this post live through kRouter. Humans review before publish.
Install kRouter