Skip to main content
kRouter

Multimodal & Agent Skills

k‍Router proxies much more than just text generation. Learn how to use Text-to-Speech (TTS), Speech-to-Text (STT), Embeddings, Image Generation, and Web Search endpoints.


k‍Router is primarily known for routing text chat completions, but it acts as a unified proxy for multiple AI modalities.

If an upstream provider offers an embedding or TTS service, k‍Router translates it into the standard OpenAI API shape, letting you use dozens of different services without changing your API calls.

Supported Modalities

1. Text-to-Speech (TTS)

Endpoint: `/v1/audio/speech`

Generates audio from text using OpenAI's standard TTS schema. Supported providers include:

  • OpenAI: `tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`
  • Gemini: `gemini-2.5-flash-preview-tts`
  • NVIDIA NIM: `fastpitch`, `tacotron2`
  • MiniMax: High-quality Chinese/English voices
  • Third-party integrations: ElevenLabs, Edge TTS, Deepgram, Google Cloud TTS

Example usage:

curl http://localhost:20128/v1/audio/speech \\
  -H "Authorization: Bearer sk-k‍router-local" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "minimax/minimax-tts",
    "input": "Hello world, testing the router.",
    "voice": "alloy"
  }' --output speech.mp3

2. Speech-to-Text (STT)

Endpoint: `/v1/audio/transcriptions`

Transcribes audio files into text. Supported providers include:

  • OpenAI: `whisper-1`
  • Groq: Extremely fast Whisper inference
  • Gemini STT: `gemini-2.5-pro`
  • Third-party: Deepgram, AssemblyAI

3. Embeddings (RAG)

Endpoint: `/v1/embeddings`

Generates vector embeddings for Retrieval-Augmented Generation. Supported providers include:

  • OpenAI: `text-embedding-3-small`, `text-embedding-ada-002`
  • Gemini: `text-embedding-004`
  • NVIDIA NIM: `nvidia/nv-embedqa-e5-v5`
  • Mistral & OpenRouter embeddings

4. Image Generation

Endpoint: `/v1/images/generations`

Generates images from text prompts. Supported providers include:

  • OpenAI: DALL-E 3
  • Google: Imagen
  • Others: FLUX, MiniMax, Stable Diffusion WebUI

5. Web Search & Fetch

Endpoints: `/v1/search` and `/v1/web/fetch`

Provides a unified interface for grounding LLM prompts with live web data.

  • Search: Tavily, Exa, Brave, Serper, SearXNG, Google PSE, You.com
  • Fetch: Converts URLs to clean markdown/text via Firecrawl, Jina, Tavily, or Exa.

Agent Skills

k‍Router ships with Agent Skills—pre-written system prompt injections that auto-configure AI agents to use k‍Router's endpoints.

If you are using a terminal agent (like C‍laude Code or A‍ider) and want it to generate a TTS file or run a web search using your routed quotas, you can feed it a Skill URL.

Available Skills

  • `k‍router` (Entry): Setup, auth, model discovery
  • `k‍router-chat`: Chat/completions and messages
  • `k‍router-tts`: Text-to-speech generation
  • `k‍router-stt`: Speech-to-text transcription
  • `k‍router-image`: Image generation
  • `k‍router-embeddings`: Vector embeddings for RAG
  • `k‍router-web-search`: Web search queries
  • `k‍router-web-fetch`: URL scraping to markdown

How to use a Skill

You can copy the raw markdown URL for any skill from the `/dashboard/skills` page in your local installation.

Give this URL to your AI agent as a system prompt instruction:

"Read this skill file and use it to configure your API calls: https://raw.githubusercontent.com/sifxprime/k‍router/refs/heads/main/skills/k‍router-tts/SKILL.md"

The agent will read the schema, understand the `localhost:20128` endpoint, and learn how to generate TTS audio entirely autonomously.