Core concepts

Providers, combos, tiers, RTK, MITM mode, quota tracking — the five ideas that make kRouter work.

Eight ideas. Read this first if you want to understand what kRouter is actually doing.

1. Providers

A provider is one upstream AI service: Claude Code, Antigravity, GLM, Kiro, OpenRouter, etc. Each has its own auth model (OAuth, free, API key, or browser cookie) and its own native API shape. kRouter abstracts all of it behind one OpenAI-compatible endpoint.

We classify into five tiers:

OAuth — sign in once with your subscription (Claude Code, Codex, GitHub Copilot, Cursor, Antigravity)
Free unlimited — Kiro, OpenCode Free
Free credits — Vertex AI ($300), NVIDIA NIM (developer program), BytePlus, Cloudflare Workers AI
Pay-per-token — OpenAI, Anthropic, OpenRouter, GLM, MiniMax, Kimi, and the rest
Browser cookie — Grok Web, Perplexity Web (paste your session cookie)

2. Combos & fallback

A combo is an ordered list of provider/model IDs. kRouter tries each in order and returns the first that succeeds. This is how you stack subscription → cheap → free into a single endpoint.

Combo: "always-on"
  1. cc/claude-opus-4-7        ← subscription primary
  2. glm/glm-5.1               ← cheap backup
  3. kr/claude-sonnet-4.5      ← free emergency fallback

When cc/claude-opus-4-7 hits its 5-hour quota window, kRouter auto-routes the next request to glm/glm-5.1. When GLM hits a hard ceiling, it falls to Kiro. The IDE never sees an error.

3. RTK — the token saver

LLM tool calls leak. A git diff output, a grep result, a tree listing — they each eat 5K-30K tokens of context. RTK ("Rust Token Killer") inspects the content of every tool result, recognizes the format, and compresses losslessly:

Strips terminal control codes
Dedups repeated stack frames
Truncates massive file dumps to head + tail
Replaces find /path/with/200-files output with a structural summary

Net effect: 20–40% fewer input tokens on the average coding request. RTK is on by default and can be toggled per request.

4. MITM mode

Some IDEs (Cursor, Kiro, Antigravity, Claude Code in some configs) talk to their backend over a fixed endpoint that you can't change. To still route them through kRouter, we install a local root CA and resolve those specific hostnames to 127.0.0.1.

This is MITM (Man-in-the-middle) in the technical sense — kRouter terminates TLS, inspects the request, routes it, and returns the response signed with our own cert. The IDE doesn't know the difference. Toggle MITM per provider in Dashboard → Profile.

Requires admin rights (binds :443 and edits /etc/hosts).

5. Quota tracking

Every provider has a different way of reporting "how much quota you have left". Some give you a daily token budget, some give you a 5-hour rolling window, some give you a percentage-remaining number, some give you nothing. kRouter normalizes all of them and shows:

Real-time used / remaining per model, per account
Reset countdown with the actual reset window
TPM vs daily-quota disambiguation — 90s cooldown for TPM bursts, 30m for daily exhaustion

If a quota reports 0% remaining vs reports nothing at all, kRouter renders them differently — amber "Exhausted • resets in X" vs red "Out of quota."

6. Routing strategies

When a provider has multiple accounts, kRouter can pick between them using a strategy:

fill-first — use the highest-priority account until it fails, then fall back
round-robin — rotate evenly, with a sticky limit so one account isn't ping-ponged
p2c — power-of-2 choices: pick the two healthiest accounts and route to the one with more remaining quota
random — uniform sampler, mostly useful for testing

Strategies are per-provider and survive restarts.

7. Response cache

Repeated non-streaming requests (warmup probes, title generation, structured-output retries) are cached in memory. If the same {model, body} pair is requested again, kRouter returns the cached response immediately. The cache is keyed by model and request body and is toggled via responseCacheEnabled.

8. Tunnel / Tailscale

By default the dashboard listens on localhost. You can expose it over a public tunnel or Tailscale so other devices or team members can reach it. Dashboard login via tunnel is blocked unless tunnelDashboardAccess is explicitly enabled — a safety guard against accidental public exposure.

Where to go next

Getting Started — install + first provider
API Reference — the OpenAI-compatible endpoints
Providers — per-provider setup
Compare — kRouter vs upstream vs OmniRoute