OpenRouter (Aggregator)

POST https://openrouter.ai/api/v1/chat/completions

Uses SSE streaming with OpenAI-compatible tool-calling format. Routes requests to any model from any provider through a single API key. OpenRouter is a model aggregator — a single API endpoint that proxies requests to Anthropic, OpenAI, DeepSeek, Qwen, xAI, Meta, Mistral, Google, and dozens more. One key, one billing account, access to everything. Best for: Experimenting with models you haven’t set up directly, unified billing, accessing niche models not yet natively supported by Wolffish.

We recommend configuring providers directly whenever possible. Direct integration gives you lower latency (no proxy hop), accurate cost tracking, provider-specific features (Anthropic’s ephemeral caching, DeepSeek’s FIM), and no middleman markup. OpenRouter adds a routing layer that can introduce latency and occasionally inconsistent behavior across providers.Use OpenRouter when you want to experiment with models you haven’t set up directly, or as a convenient fallback for providers where you don’t want to manage a separate API key.

Getting an API Key

Go to openrouter.ai
Sign up or log in
Navigate to Keys and create a new key
Paste it into Wolffish → Settings → Models → OpenRouter

Models

OpenRouter routes to hundreds of models across dozens of providers. The table below is a curated top-20 of the most popular models — the picker exposes many more once your key is connected:

Model	Context	Modes	Input / Output (per MTok)	Cached
anthropic/claude-opus-4.1	200K	Off, High	$15.00 /$ 75.00	—
anthropic/claude-sonnet-4.5	1M	Off, High	$3.00 /$ 15.00	—
openai/gpt-5	400K	Off, High	$1.25 /$ 10.00	—
openai/gpt-5-mini	400K	Off, High	$0.25 /$ 2.00	—
openai/o3	200K	Off, High	$2.00 /$ 8.00	—
openai/o4-mini	200K	Off, High	$1.10 /$ 4.40	—
openai/gpt-4o	128K	—	$2.50 /$ 10.00	—
openai/gpt-4.1	1M	—	$2.00 /$ 8.00	—
google/gemini-2.5-pro	1M	Off, High	$1.25 /$ 10.00	—
google/gemini-2.5-flash	1M	Off, High	$0.30 /$ 2.50	—
deepseek/deepseek-r1	164K	Off, High	$0.70 /$ 2.50	—
deepseek/deepseek-chat-v3.1	164K	Off, High	$0.21 /$ 0.79	—
x-ai/grok-4.3	1M	Off, High	$1.25 /$ 2.50	—
qwen/qwen3-235b-a22b	131K	Off, High	$0.45 /$ 1.82	—
z-ai/glm-4.6	203K	Off, High	$0.43 /$ 1.74	—
meta-llama/llama-4-maverick	1M	—	$0.15 /$ 0.60	—
meta-llama/llama-3.3-70b-instruct	131K	—	$0.10 /$ 0.32	—
mistralai/mistral-large	128K	—	$2.00 /$ 6.00	—
mistralai/mistral-medium-3	131K	—	$0.40 /$ 2.00	—
moonshotai/kimi-k2	131K	—	$0.57 /$ 2.30	—

This is a curated shortlist, not the full catalogue. OpenRouter exposes hundreds of models through one key — the picker lists many more than the popular ones shown here. Wolffish doesn’t track per-token cache discounts for routed models, so the Cached column is unavailable across the board.

Direct Integration vs. OpenRouter

	Direct Provider	OpenRouter
Latency	Lowest — direct API call	Higher — extra proxy hop
Cost	Provider pricing only	Provider pricing + OpenRouter margin
Features	Full provider-specific features	Normalized subset
Billing	Separate per provider	Single unified bill
Model access	Only configured providers	Hundreds of models, one key
Caching	Provider-native (Anthropic ephemeral, DeepSeek, etc.)	Varies by underlying provider

When to Use OpenRouter

Good fit:

Trying models from providers you haven’t configured yet
Quick A/B testing across different model families
Unified billing when you only want one API bill
Accessing niche or newer models not yet natively supported

Use direct integration instead when:

The provider is already natively supported (DeepSeek, Anthropic, OpenAI, etc.)
You need the lowest possible latency
You want provider-specific features (caching, prompt prefixes, etc.)
You’re running high-volume production workloads where the proxy hop adds up

If you’re already using DeepSeek, Anthropic, or any other natively supported provider, keep that direct connection. Add OpenRouter only for models you can’t access directly — then select an OpenRouter model as your Brain when you want to use it. There’s no cascade; the model you select is the one that runs.

Reasoning modes

The brain icon next to the message box controls how this model reasons. Click it to cycle through the modes the selected model supports. Two separate ideas combine here:

Thinking — whether the model reasons

Off — the model answers immediately. Fastest and cheapest; ideal for simple, direct tasks.
On — the model first works through the problem in a dedicated reasoning pass before replying. Slower and uses more tokens, but markedly more accurate on multi-step, logical, or ambiguous tasks.

Effort — how hard it thinks

Only effort-capable models expose this; it applies once thinking is on.

High — standard reasoning depth. The right default for most agentic work.
Max — the model reasons longer and deeper for the hardest problems. More tokens and latency in exchange for higher quality on complex work.

Button states

State	Colour	Meaning
Off	gray	Thinking off — direct answer
On	blue	Thinking on — no effort control
High	purple	Thinking on, standard effort
Max	orange	Thinking on, maximum effort

Each model shows only the states it genuinely supports. If a model always reasons (can’t be turned off) or has no effort control, the button reflects that and locks where there’s nothing to change. Wolffish remembers your choice per model. On OpenRouter: Reasoning depends on the routed model. Reasoning-capable models show Off / High; non-reasoning models have no control. OpenRouter caps effort at High, and some endpoints (e.g. GPT-5, DeepSeek-R) reason mandatorily — there ‘Off’ falls back to minimal reasoning.

Using OpenRouter as Your Model

There’s no provider cascade and no priority order. To use an OpenRouter model, select it as your Brain in Settings → Modes (or as your Worker model in orchestrator mode). The model you select is the one that runs — OpenRouter is not an automatic fallback behind your other providers.

​OpenRouter (Aggregator)

​Getting an API Key

​Models

​Direct Integration vs. OpenRouter

​When to Use OpenRouter

​Reasoning modes

​Thinking — whether the model reasons

​Effort — how hard it thinks

​Button states

​Using OpenRouter as Your Model

OpenRouter (Aggregator)

Getting an API Key

Models

Direct Integration vs. OpenRouter

When to Use OpenRouter

Reasoning modes

Thinking — whether the model reasons

Effort — how hard it thinks

Button states

Using OpenRouter as Your Model