DeepSeek (Recommended)

POST https://api.deepseek.com/chat/completions

Uses SSE streaming with OpenAI-compatible tool-calling format. DeepSeek V4 Pro is Wolffish’s recommended default for agentic tasks. Following the permanent 75% price cut (May 2026), it delivers frontier-class reasoning and tool-use reliability at 29–34× less than competing frontier models on output-heavy workloads — while matching or exceeding their agentic performance on multi-step tool chains. It’s also MIT-licensed, so you can self-host for $0 in API fees if you have the infra. Best for: Agentic multi-step workflows, tool calling, research chains, cost-efficient daily automations.

If you’re setting up Wolffish for the first time and want one provider that does it all — reliable tool use, strong reasoning, fast responses, minimal cost — start with DeepSeek V4 Pro. You can always add Anthropic or OpenAI later for specific use cases.

Getting an API Key

Go to platform.deepseek.com
Sign up or log in
Navigate to API Keys and create a new key
Paste it into Wolffish → Settings → Models → DeepSeek

Models

Model	Context	Modes	Input / Output (per MTok)	Notes
deepseek-v4-pro	1M	Off, High, Max	$0.44 /$ 0.87	Recommended default. Frontier agentic performance. Cached: $0.01/MTok.
deepseek-v4-flash	1M	Off, High, Max	$0.14 /$ 0.28	Fast and cheap. Cached: $0.003/MTok.

Reasoning modes

The brain icon next to the message box controls how this model reasons. Click it to cycle through the modes the selected model supports. Two separate ideas combine here:

Thinking — whether the model reasons

Off — the model answers immediately. Fastest and cheapest; ideal for simple, direct tasks.
On — the model first works through the problem in a dedicated reasoning pass before replying. Slower and uses more tokens, but markedly more accurate on multi-step, logical, or ambiguous tasks.

Effort — how hard it thinks

Only effort-capable models expose this; it applies once thinking is on.

High — standard reasoning depth. The right default for most agentic work.
Max — the model reasons longer and deeper for the hardest problems. More tokens and latency in exchange for higher quality on complex work.

Button states

State	Colour	Meaning
Off	gray	Thinking off — direct answer
On	blue	Thinking on — no effort control
High	purple	Thinking on, standard effort
Max	orange	Thinking on, maximum effort

Each model shows only the states it genuinely supports. If a model always reasons (can’t be turned off) or has no effort control, the button reflects that and locks where there’s nothing to change. Wolffish remembers your choice per model. On DeepSeek: Both V4 models support Off / High / Max. In current testing High and Max produce similar depth, but Max is exposed so it benefits automatically if DeepSeek differentiates the tiers later.

Model Selection & Retries

Wolffish communicates with LLMs via nine cloud providers plus a local option, all using pure fetch() — no SDKs. Each provider has its own streaming format and tool-calling convention, which wernicke.ts normalizes into a single interface. Select your Brain model explicitly in Settings → Modes — the model you choose is the one that runs. There’s no cascade or fallback order; if you want a second model for parallel work, turn on orchestrator mode and assign a Worker model. When a cloud Brain hits a transient error, thalamus retries the same model on a backoff schedule (it also uses net.isOnline() for instant offline detection). It does not route you to a different provider on failure.

​DeepSeek (Recommended)

​Getting an API Key

​Models

​Reasoning modes

​Thinking — whether the model reasons

​Effort — how hard it thinks

​Button states

​Model Selection & Retries

DeepSeek (Recommended)

Getting an API Key

Models

Reasoning modes

Thinking — whether the model reasons

Effort — how hard it thinks

Button states

Model Selection & Retries