Core concepts
Providers, combos, tiers, RTK, MITM mode, quota tracking — the five ideas that make kRouter work.
Eight ideas. Read this first if you want to understand what kRouter is actually doing.
1. Providers
A provider is one upstream AI service: Claude Code, Antigravity, GLM, Kiro, OpenRouter, etc. Each has its own auth model (OAuth, free, API key, or browser cookie) and its own native API shape. kRouter abstracts all of it behind one OpenAI-compatible endpoint.
We classify into five tiers:
- OAuth — sign in once with your subscription (Claude Code, Codex, GitHub Copilot, Cursor, Antigravity)
- Free unlimited — Kiro, OpenCode Free
- Free credits — Vertex AI ($300), NVIDIA NIM (developer program), BytePlus, Cloudflare Workers AI
- Pay-per-token — OpenAI, Anthropic, OpenRouter, GLM, MiniMax, Kimi, and the rest
- Browser cookie — Grok Web, Perplexity Web (paste your session cookie)
2. Combos & fallback
A combo is an ordered list of provider/model IDs. kRouter tries each in order and returns the first that succeeds. This is how you stack subscription → cheap → free into a single endpoint.
Combo: "always-on"
1. cc/claude-opus-4-7 ← subscription primary
2. glm/glm-5.1 ← cheap backup
3. kr/claude-sonnet-4.5 ← free emergency fallbackWhen cc/claude-opus-4-7 hits its 5-hour quota window, kRouter auto-routes the next request to glm/glm-5.1. When GLM hits a hard ceiling, it falls to Kiro. The IDE never sees an error.
3. RTK — the token saver
LLM tool calls leak. A git diff output, a grep result, a tree listing — they each eat 5K-30K tokens of context. RTK ("Rust Token Killer") inspects the content of every tool result, recognizes the format, and compresses losslessly:
- Strips terminal control codes
- Dedups repeated stack frames
- Truncates massive file dumps to head + tail
- Replaces
find /path/with/200-filesoutput with a structural summary
Net effect: 20–40% fewer input tokens on the average coding request. RTK is on by default and can be toggled per request.
4. MITM mode
Some IDEs (Cursor, Kiro, Antigravity, Claude Code in some configs) talk to their backend over a fixed endpoint that you can't change. To still route them through kRouter, we install a local root CA and resolve those specific hostnames to 127.0.0.1.
This is MITM (Man-in-the-middle) in the technical sense — kRouter terminates TLS, inspects the request, routes it, and returns the response signed with our own cert. The IDE doesn't know the difference. Toggle MITM per provider in Dashboard → Profile.
Requires admin rights (binds :443 and edits /etc/hosts).
5. Quota tracking
Every provider has a different way of reporting "how much quota you have left". Some give you a daily token budget, some give you a 5-hour rolling window, some give you a percentage-remaining number, some give you nothing. kRouter normalizes all of them and shows:
- Real-time used / remaining per model, per account
- Reset countdown with the actual reset window
- TPM vs daily-quota disambiguation — 90s cooldown for TPM bursts, 30m for daily exhaustion
If a quota reports 0% remaining vs reports nothing at all, kRouter renders them differently — amber "Exhausted • resets in X" vs red "Out of quota."
6. Routing strategies
When a provider has multiple accounts, kRouter can pick between them using a strategy:
- fill-first — use the highest-priority account until it fails, then fall back
- round-robin — rotate evenly, with a sticky limit so one account isn't ping-ponged
- p2c — power-of-2 choices: pick the two healthiest accounts and route to the one with more remaining quota
- random — uniform sampler, mostly useful for testing
Strategies are per-provider and survive restarts.
7. Response cache
Repeated non-streaming requests (warmup probes, title generation, structured-output retries) are cached in memory. If the same {model, body} pair is requested again, kRouter returns the cached response immediately. The cache is keyed by model and request body and is toggled via responseCacheEnabled.
8. Tunnel / Tailscale
By default the dashboard listens on localhost. You can expose it over a public tunnel or Tailscale so other devices or team members can reach it. Dashboard login via tunnel is blocked unless tunnelDashboardAccess is explicitly enabled — a safety guard against accidental public exposure.
Where to go next
- Getting Started — install + first provider
- API Reference — the OpenAI-compatible endpoints
- Providers — per-provider setup
- Compare — kRouter vs upstream vs OmniRoute