Multimodal & Agent Skills
kRouter proxies much more than just text generation. Learn how to use Text-to-Speech (TTS), Speech-to-Text (STT), Embeddings, Image Generation, and Web Search endpoints.
kRouter is primarily known for routing text chat completions, but it acts as a unified proxy for multiple AI modalities.
If an upstream provider offers an embedding or TTS service, kRouter translates it into the standard OpenAI API shape, letting you use dozens of different services without changing your API calls.
Supported Modalities
1. Text-to-Speech (TTS)
Endpoint: `/v1/audio/speech`
Generates audio from text using OpenAI's standard TTS schema. Supported providers include:
- OpenAI: `tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`
- Gemini: `gemini-2.5-flash-preview-tts`
- NVIDIA NIM: `fastpitch`, `tacotron2`
- MiniMax: High-quality Chinese/English voices
- Third-party integrations: ElevenLabs, Edge TTS, Deepgram, Google Cloud TTS
Example usage:
curl http://localhost:20128/v1/audio/speech \\
-H "Authorization: Bearer sk-krouter-local" \\
-H "Content-Type: application/json" \\
-d '{
"model": "minimax/minimax-tts",
"input": "Hello world, testing the router.",
"voice": "alloy"
}' --output speech.mp32. Speech-to-Text (STT)
Endpoint: `/v1/audio/transcriptions`
Transcribes audio files into text. Supported providers include:
- OpenAI: `whisper-1`
- Groq: Extremely fast Whisper inference
- Gemini STT: `gemini-2.5-pro`
- Third-party: Deepgram, AssemblyAI
3. Embeddings (RAG)
Endpoint: `/v1/embeddings`
Generates vector embeddings for Retrieval-Augmented Generation. Supported providers include:
- OpenAI: `text-embedding-3-small`, `text-embedding-ada-002`
- Gemini: `text-embedding-004`
- NVIDIA NIM: `nvidia/nv-embedqa-e5-v5`
- Mistral & OpenRouter embeddings
4. Image Generation
Endpoint: `/v1/images/generations`
Generates images from text prompts. Supported providers include:
- OpenAI: DALL-E 3
- Google: Imagen
- Others: FLUX, MiniMax, Stable Diffusion WebUI
5. Web Search & Fetch
Endpoints: `/v1/search` and `/v1/web/fetch`
Provides a unified interface for grounding LLM prompts with live web data.
- Search: Tavily, Exa, Brave, Serper, SearXNG, Google PSE, You.com
- Fetch: Converts URLs to clean markdown/text via Firecrawl, Jina, Tavily, or Exa.
Agent Skills
kRouter ships with Agent Skills—pre-written system prompt injections that auto-configure AI agents to use kRouter's endpoints.
If you are using a terminal agent (like Claude Code or Aider) and want it to generate a TTS file or run a web search using your routed quotas, you can feed it a Skill URL.
Available Skills
- `krouter` (Entry): Setup, auth, model discovery
- `krouter-chat`: Chat/completions and messages
- `krouter-tts`: Text-to-speech generation
- `krouter-stt`: Speech-to-text transcription
- `krouter-image`: Image generation
- `krouter-embeddings`: Vector embeddings for RAG
- `krouter-web-search`: Web search queries
- `krouter-web-fetch`: URL scraping to markdown
How to use a Skill
You can copy the raw markdown URL for any skill from the `/dashboard/skills` page in your local installation.
Give this URL to your AI agent as a system prompt instruction:
"Read this skill file and use it to configure your API calls: https://raw.githubusercontent.com/sifxprime/krouter/refs/heads/main/skills/krouter-tts/SKILL.md"
The agent will read the schema, understand the `localhost:20128` endpoint, and learn how to generate TTS audio entirely autonomously.