RTK Token Saver
Compresses tool_result content (git diff, grep, ls, tree) inline before sending. Saves 20–40% input tokens per request.
The AI infrastructure layer beneath your favourite IDE. Route Claude Code, Cursor, and 89+ providers through a single OpenAI-compatible endpoint on your machine.
How it works
kRouter runs on your machine and exposes a single OpenAI-compatible endpoint at localhost:20128. Point any IDE at it. Behind the scenes it routes to your subscription first, falls back to free tiers, and only touches paid keys when nothing else works.
Your IDE
kRouter
:20128
89+ providers
one OpenAI-compatible API · subscription → free → paid fallback · RTK compression · format translation
Pick NPM if you just want it to work. Docker if you self-host. Git if you want to hack on the code.
npm install -g @sifxprime/krouterkrouter -tDashboard: http://localhost:20128/dashboard
Every feature ships with audited unit tests, end-to-end verification, and visible-in-the-dashboard observability.
Compresses tool_result content (git diff, grep, ls, tree) inline before sending. Saves 20–40% input tokens per request.
Optional terse-response prompt injection (Lite / Full / Wenyan) to cut output tokens up to 65% without losing technical substance.
Subscription → cheap → free. Combos auto-rotate when one quota hits zero. No more rate-limit interruptions.
Per-provider account selection: fill-first, round-robin (sticky), p2c, or random. Atomic backoff keeps concurrency safe.
In-memory cache for repeated non-streaming requests. Warmup probes and title generation skip redundant upstream calls.
Live token counts, reset countdowns, and per-model spend. Quota tracker matches Google's own backend numbers.
Intercept Antigravity, Kiro, Copilot, and Cursor IDE traffic locally. Bounds-checked frames, NGHTTP2 stream recovery.
Expose the dashboard over a public tunnel or Tailscale. Configurable access controls so it stays safe when shared.
OAuth, free credits, API key, and browser-cookie providers. Stack them into combos that auto-fall-through.
OpenAI ↔ Claude ↔ Gemini ↔ Cursor ↔ Kiro. Use any tool with any model. The translation layer handles the rest.
OAuth tokens refresh before expiration. Concurrent-safe — no stale-401 cascades when traffic spikes.
MIT licensed, self-hosted, never charges. The dashboard 'cost' is a savings tracker — you only pay providers directly.
OAuth subscription providers, genuinely-free tiers, and pay-per-token APIs. Stack them into combos so requests auto-fall-through.
All three projects share a common ancestor (CLIProxyAPI in Go). Each takes the idea in a different direction. Pick by fit, not loyalty.
| Concern | kRouter | 9router | OmniRoute |
|---|---|---|---|
| Verify-your-account ban fix | Numeric enums (matches binary) | String enums (triggers ban) | Permanent-ban classifier |
| Exhausted Claude quota display | Amber 'Exhausted • resets in X' | Fake red 100%-used bar | Similar to kRouter |
| Combo retry on busy IDE | ~5s with per-provider concurrency | ~25s flat 30s timeout cascade | Tunable, similar |
| MITM stream error recovery | NGHTTP2 → HTTP/1.1 fallback | "Truncated event message" | Different MITM stack |
| Thinking config passthrough | Translates Claude/OpenAI shape | Blacklist strips, never runs | Translates at converter |
The new floor for AI coding
kRouter is the layer beneath your favourite IDE — Claude Code, Cursor, Antigravity, Copilot, Codex, Kiro. Connect any model. Switch providers on quota. Keep every prompt on your machine.
MIT licensed · v0.5.81 · Built in the open.
What you actually get