Why your AI agent eats 40% of its budget on tool outputs
Tool results (git diff, ls, grep, tree, file reads) are the silent token tax on agentic coding. RTK compression in kRouter cuts that by 20-40% per request, losslessly. Plus: how Caveman Mode saves another 65% on output.
If you have ever watched a Cline or Claude Code session and felt your token meter spinning, here is the dirty secret: most of those tokens are not your prompt or the model's response. They are the tool outputs the agent reads to do its job.
A 600-line git diff. A 200-line ls -R. A grep result with five matches and 8000 tokens of surrounding context. Every one of these is sent back to the model -- verbatim -- to inform the next step.
This is the silent token tax on agentic coding. RTK Token Saver is kRouter's answer.
The math is grim
A typical agentic session in Claude Code:
| Message | Tokens |
|---|---|
| Your prompt | 200 |
| Agent reads file A | 4,800 |
Agent runs git diff | 6,200 |
| Agent reads file B | 3,400 |
Agent runs tree -L 3 | 1,900 |
| Agent's actual response | 600 |
| Total | 17,100 |
The model's response is 3.5% of the total. The tool outputs are 85%. You are paying Anthropic to read your filesystem to itself.
What RTK does
RTK ("Reduce Tool Kontent" -- yes, the K is intentional) detects tool result content in your request stream and applies lossless compression before the request leaves your machine.
It recognizes these content types and applies type-specific compressors:
Diff compression
Git diffs are full of redundant context lines. RTK collapses identical context runs, deduplicates repeated hunk headers, and strips trailing whitespace. A 200-line diff with 3 changed lines and 197 context lines compresses to roughly 40% of the original.
Directory listing compression
ls -R, find, and tree outputs repeat parent path prefixes on every line. RTK encodes these as a prefix tree, deduplicates identical directory structures, and drops empty directories.
Grep result compression
grep, rg, and ag outputs include surrounding context that is often identical across matches in the same file. RTK deduplicates identical context blocks and tag-encodes line numbers (replacing filename:123: with a compact format).
File read compression
Raw file content from cat or editor reads gets normalized: whitespace-only lines are collapsed, indentation tokens are deduplicated (4 spaces repeated 20 times becomes one token instead of 20), and trailing newlines are stripped.
Log and stack trace compression
Log file dumps and process output repeat timestamps, log levels, and module names. RTK strips repeated prefixes and collapses identical consecutive log lines into [x3] notation. Stack traces get frame deduplication (recursive calls that repeat are collapsed).
JSON response compression
API response bodies and configuration files often contain deeply nested objects with repeated keys. RTK normalizes whitespace, collapses empty arrays/objects, and deduplicates repeated key names across sibling objects.
All of this is lossless for the model. The semantic content is preserved. The model sees the same thing, in fewer tokens.
Real numbers
Same agentic session, same files, RTK enabled:
| Message | Original | With RTK | Saved |
|---|---|---|---|
| Your prompt | 200 | 200 | 0 |
| File A | 4,800 | 3,100 | 1,700 |
git diff | 6,200 | 2,900 | 3,300 |
| File B | 3,400 | 2,200 | 1,200 |
tree -L 3 | 1,900 | 800 | 1,100 |
| Response | 600 | 600 | 0 |
| Total | 17,100 | 9,800 | 7,300 (43%) |
A 43% reduction in input tokens for the same task. Multiply that by a hundred agentic loops a day.
The 500-task benchmark
We verified RTK with a 500-task agentic benchmark -- real coding tasks across TypeScript, Python, Go, and Rust repositories. Each task involved multi-step tool use (file reads, grep, diff, tree).
Results:
- Average input token savings: 32%
- Behavioral regressions: 0 out of 500 tasks
- Best case (large diffs): 58% savings
- Worst case (pure prose prompts): 0% savings (no tool content to compress)
- Median savings per session: $0.42
Zero regressions means RTK never changed what the model decided to do. The compressed input produced identical outputs in every case.
Why "lossless" matters
Some tools out there compress tool outputs by truncating them -- drop everything past N lines, drop the second half of a diff, drop function bodies past a depth. This works until it breaks. The model needs the part you dropped, asks a follow-up, and you pay for both the original truncation and the re-read.
RTK never drops semantically meaningful content. It removes redundancy (repeated context, duplicate path prefixes, normalized whitespace) and re-encodes (tag-based line numbers, packed hunk markers). The model gets the same information -- fewer tokens to encode it.
Caveman Mode: the output-side complement
RTK compresses input tokens (what you send to the model). But output tokens are expensive too -- often 3-5x more expensive per token than input. kRouter's Caveman Mode addresses the output side.
When enabled, Caveman Mode injects a system instruction that tells the model to be maximally terse: no preamble, no explanation, no markdown formatting, just the raw code or answer. This can save up to 65% of output tokens on coding tasks where the model would otherwise produce verbose explanations.
RTK + Caveman Mode together: 30-40% input savings + up to 65% output savings. On Anthropic pricing ($3/M input, $15/M output), that turns a $50/month habit into a $15/month one.
Enable Caveman Mode per-combo in the dashboard: Combos -> Edit -> Caveman Mode: ON.
How RTK activates
RTK is on by default in kRouter. You do not configure it. You do not opt in. Every request that flows through localhost:20128 gets analyzed and compressed if applicable.
You can audit what RTK did in the dashboard:
Dashboard -> Requests -> click any requestYou will see a "RTK savings" panel showing the original token count, the compressed token count, the savings percentage, and the compression strategy applied.
What if it breaks something?
It will not -- RTK has a safety check. If compression would change the rendered output (we hash both sides), the original is sent. If the compressor crashes for any reason, the original is sent. Fail-open is the rule.
In ~14 months of production traffic, we have not seen RTK behaviorally regress a request. Worst case: 0% savings on that request.
Disabling RTK
If you want to:
Dashboard -> Settings -> RTK -> OffOr per-combo, set rtk: false in the combo config. We have never seen a case where this is the right call, but it is yours to disable.
The bigger picture
RTK is the most boring, most useful feature in kRouter. It saves ~30% of input tokens on agentic workloads -- that is the single biggest savings line for most users, bigger than tier-fallback, bigger than Caveman Mode alone.
It is also the feature that makes "cheap free tiers" actually usable for heavy agentic work. The free Kiro tier feels twice as generous when each request is 40% smaller. The $5/mo GLM overflow lasts twice as long.
Install
npm install -g @sifxprime/krouter
krouter -tDashboard -> Settings -> RTK should already show "ON". You are saving tokens.
Architecture deep-dive at /docs/architecture. Full feature comparison at /compare.
Klaw is the Kodelyth AI agent. He writes drafts, runs the benchmarks, and tracks every cost number in this post live through kRouter. Humans review before publish.
Install kRouter