Combos & fallback
Stack subscription → cheap → free into a single endpoint. When one provider hits a quota wall, the next takes over. No interruption.
A combo is an ordered list of provider/model IDs that kRouter treats as a single endpoint. When you call the combo, kRouter tries the first model. If it fails or hits a quota, kRouter falls through to the next. Your IDE never sees an error.
Combos turn three separate accounts into one reliable model. Or three free providers into infinite uptime. Or one paid subscription plus a cheap backup into a never-stop-coding setup.
The three-tier pattern
Most users build a combo with one of each tier:
Combo: "always-on"
1. cc/claude-opus-4-7 ← subscription (you already pay)
2. glm/glm-5.1 ← cheap backup ($0.60/M tokens)
3. kr/claude-sonnet-4.5 ← free emergency (Kiro)Behavior:
- First request goes to Claude Code (your subscription).
- When the 5-hour quota window runs out (kRouter sees the 429), the very next request falls through to GLM.
- When GLM hits its daily ceiling, kRouter falls through to Kiro.
- When the subscription window reopens 5 hours later, kRouter routes back to Claude. The IDE never sees a single error.
Creating a combo
In the dashboard:
- Open Combos
- Click Create new
- Name it something memorable (
always-on,cheap-coding,free-only) - Add models in order — the first one is your primary
- Save
Now use the combo name as a model ID in your IDE:
http://localhost:20128/v1
Authorization: Bearer sk-krouter-XXXX
Model: always-onFallback triggers
kRouter falls through to the next provider when it sees:
| Trigger | Action |
|---|---|
| HTTP 429 (quota) | Cool the account down, try next provider |
| HTTP 401 (token expired) | Refresh the token, retry once on the same provider |
| HTTP 5xx (upstream error) | Try next provider after a 250 ms backoff |
| Streaming error mid-response | Cancel, surface the partial response if any |
| TPM limit (per-minute) | Cool down 90 seconds, try next provider |
| Daily quota | Cool down 30 minutes, try next provider |
| "Verify your account" 403 | Lock the account for 24h, try next provider |
The 30-minute and 24-hour locks are persistent — kRouter remembers them across restarts so you don't accidentally hit a banned account twice.
Combo strategies
Beyond simple priority order, combos support multiple account-picker strategies inside each provider tier:
- Zenith (Default) — an intelligent AI-driven scoring engine that evaluates live health data (TTFB latency, success rate) and quota headroom. It mathematically pre-ranks accounts and picks the absolute best one, completely eliminating wasted rate-limit stalls.
- Round-robin — rotate evenly across all accounts.
- P2C (power-of-2 choices) — pick two accounts at random and route to whichever has more remaining quota.
Zenith is heavily recommended. Because it operates on a sub-5ms RAM layer, it can hot-swap a dead account for a live one in under a millisecond.
Per-model overrides
A combo can pin specific models on each provider:
Combo: "thinking-stack"
1. cc/claude-opus-4-7-thinking ← deep work
2. cc/claude-sonnet-4-6 ← fast work, same account
3. glm/glm-5.1 ← cheap backup
4. kr/claude-sonnet-4.5 ← freeThis is how you mix thinking and non-thinking models from the same provider into one logical endpoint.
Combos of combos
A combo can reference another combo as one of its steps:
Combo: "free-only"
1. kr/claude-sonnet-4.5
2. oc/<auto>
3. vertex/gemini-3.1-pro-preview
Combo: "premium-with-free-emergency"
1. cc/claude-opus-4-7
2. glm/glm-5.1
3. free-only ← falls into the free combo if both failkRouter resolves combo-of-combo to a maximum depth of 3 — beyond that it 400s with a clear error so you don't accidentally create a cycle.
When to use combos vs single models
- Single model: prototypes, exploratory chats, anything where you don't care about uptime
- Two-tier combo: subscription + one cheap backup is the most popular pattern (95% of users)
- Three-tier combo: anyone serious about not being interrupted mid-task
- Free-only combo: zero-cost setups, $0/month forever
Where to go next
- Core concepts — RTK, MITM, and quota tracking
- Providers — the four provider tiers
- API reference — the OpenAI-compatible endpoints