Skip to main content

OpenRouter (Aggregator)

POST https://openrouter.ai/api/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Routes requests to any model from any provider through a single API key. OpenRouter is a model aggregator — a single API endpoint that proxies requests to Anthropic, OpenAI, DeepSeek, Qwen, xAI, Meta, Mistral, Google, and dozens more. One key, one billing account, access to everything. Best for: Experimenting with models you haven’t set up directly, unified billing, accessing niche models not yet natively supported by Wolffish.
We recommend configuring providers directly whenever possible. Direct integration gives you lower latency (no proxy hop), accurate cost tracking, provider-specific features (Anthropic’s ephemeral caching, DeepSeek’s FIM), and no middleman markup. OpenRouter adds a routing layer that can introduce latency and occasionally inconsistent behavior across providers.Use OpenRouter when you want to experiment with models you haven’t set up directly, or as a convenient fallback for providers where you don’t want to manage a separate API key.

Getting an API Key

  1. Go to openrouter.ai
  2. Sign up or log in
  3. Navigate to Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → OpenRouter

Models

OpenRouter routes to hundreds of models across dozens of providers. The table below is a curated top-20 of the most popular models — the picker exposes many more once your key is connected:
ModelContextModesInput / Output (per MTok)Cached
anthropic/claude-opus-4.1200KOff, High15.00/15.00 / 75.00
anthropic/claude-sonnet-4.51MOff, High3.00/3.00 / 15.00
openai/gpt-5400KOff, High1.25/1.25 / 10.00
openai/gpt-5-mini400KOff, High0.25/0.25 / 2.00
openai/o3200KOff, High2.00/2.00 / 8.00
openai/o4-mini200KOff, High1.10/1.10 / 4.40
openai/gpt-4o128K2.50/2.50 / 10.00
openai/gpt-4.11M2.00/2.00 / 8.00
google/gemini-2.5-pro1MOff, High1.25/1.25 / 10.00
google/gemini-2.5-flash1MOff, High0.30/0.30 / 2.50
deepseek/deepseek-r1164KOff, High0.70/0.70 / 2.50
deepseek/deepseek-chat-v3.1164KOff, High0.21/0.21 / 0.79
x-ai/grok-4.31MOff, High1.25/1.25 / 2.50
qwen/qwen3-235b-a22b131KOff, High0.45/0.45 / 1.82
z-ai/glm-4.6203KOff, High0.43/0.43 / 1.74
meta-llama/llama-4-maverick1M0.15/0.15 / 0.60
meta-llama/llama-3.3-70b-instruct131K0.10/0.10 / 0.32
mistralai/mistral-large128K2.00/2.00 / 6.00
mistralai/mistral-medium-3131K0.40/0.40 / 2.00
moonshotai/kimi-k2131K0.57/0.57 / 2.30
This is a curated shortlist, not the full catalogue. OpenRouter exposes hundreds of models through one key — the picker lists many more than the popular ones shown here. Wolffish doesn’t track per-token cache discounts for routed models, so the Cached column is unavailable across the board.

Direct Integration vs. OpenRouter

Direct ProviderOpenRouter
LatencyLowest — direct API callHigher — extra proxy hop
CostProvider pricing onlyProvider pricing + OpenRouter margin
FeaturesFull provider-specific featuresNormalized subset
BillingSeparate per providerSingle unified bill
Model accessOnly configured providersHundreds of models, one key
CachingProvider-native (Anthropic ephemeral, DeepSeek, etc.)Varies by underlying provider

When to Use OpenRouter

Good fit:
  • Trying models from providers you haven’t configured yet
  • Quick A/B testing across different model families
  • Unified billing when you only want one API bill
  • Accessing niche or newer models not yet natively supported
Use direct integration instead when:
  • The provider is already natively supported (DeepSeek, Anthropic, OpenAI, etc.)
  • You need the lowest possible latency
  • You want provider-specific features (caching, prompt prefixes, etc.)
  • You’re running high-volume production workloads where the proxy hop adds up
If you’re already using DeepSeek, Anthropic, or any other natively supported provider, keep that direct connection. Add OpenRouter only for models you can’t access directly — then select an OpenRouter model as your Brain when you want to use it. There’s no cascade; the model you select is the one that runs.

Reasoning modes

The brain icon next to the message box controls how this model reasons. Click it to cycle through the modes the selected model supports. Two separate ideas combine here:

Thinking — whether the model reasons

  • Off — the model answers immediately. Fastest and cheapest; ideal for simple, direct tasks.
  • On — the model first works through the problem in a dedicated reasoning pass before replying. Slower and uses more tokens, but markedly more accurate on multi-step, logical, or ambiguous tasks.

Effort — how hard it thinks

Only effort-capable models expose this; it applies once thinking is on.
  • High — standard reasoning depth. The right default for most agentic work.
  • Max — the model reasons longer and deeper for the hardest problems. More tokens and latency in exchange for higher quality on complex work.

Button states

StateColourMeaning
OffgrayThinking off — direct answer
OnblueThinking on — no effort control
HighpurpleThinking on, standard effort
MaxorangeThinking on, maximum effort
Each model shows only the states it genuinely supports. If a model always reasons (can’t be turned off) or has no effort control, the button reflects that and locks where there’s nothing to change. Wolffish remembers your choice per model. On OpenRouter: Reasoning depends on the routed model. Reasoning-capable models show Off / High; non-reasoning models have no control. OpenRouter caps effort at High, and some endpoints (e.g. GPT-5, DeepSeek-R) reason mandatorily — there ‘Off’ falls back to minimal reasoning.

Using OpenRouter as Your Model

There’s no provider cascade and no priority order. To use an OpenRouter model, select it as your Brain in Settings → Modes (or as your Worker model in orchestrator mode). The model you select is the one that runs — OpenRouter is not an automatic fallback behind your other providers.