> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wolffi.sh/llms.txt
> Use this file to discover all available pages before exploring further.

# DeepSeek

> Set up DeepSeek — Wolffish's recommended default provider for agentic tasks

# DeepSeek (Recommended)

```
POST https://api.deepseek.com/chat/completions
```

Uses SSE streaming with OpenAI-compatible tool-calling format.

**DeepSeek V4 Pro is Wolffish's recommended default for agentic tasks.** Following the [permanent 75% price cut](https://deepseek.ai/blog/deepseek-v4-pro-api-price-cut-permanent) (May 2026), it delivers frontier-class reasoning and tool-use reliability at 29–34× less than competing frontier models on output-heavy workloads — while matching or exceeding their agentic performance on multi-step tool chains. It's also MIT-licensed, so you can self-host for \$0 in API fees if you have the infra.

Best for: Agentic multi-step workflows, tool calling, research chains, cost-efficient daily automations.

<Tip>
  If you're setting up Wolffish for the first time and want one provider that does it all — reliable tool use, strong reasoning, fast responses, minimal cost — start with DeepSeek V4 Pro. You can always add Anthropic or OpenAI later for specific use cases.
</Tip>

## Getting an API Key

1. Go to [platform.deepseek.com](https://platform.deepseek.com)
2. Sign up or log in
3. Navigate to **API Keys** and create a new key
4. Paste it into Wolffish → Settings → Models → DeepSeek

## Models

| Model               | Context | Modes          | Input / Output (per MTok) | Notes                                                                   |
| ------------------- | ------- | -------------- | ------------------------- | ----------------------------------------------------------------------- |
| **deepseek-v4-pro** | 1M      | Off, High, Max | $0.44 / $0.87             | Recommended default. Frontier agentic performance. Cached: \$0.01/MTok. |
| deepseek-v4-flash   | 1M      | Off, High, Max | $0.14 / $0.28             | Fast and cheap. Cached: \$0.003/MTok.                                   |

## Reasoning modes

The **brain icon** next to the message box controls how this model reasons. Click it to cycle through the modes the selected model supports. Two separate ideas combine here:

### Thinking — *whether* the model reasons

* **Off** — the model answers immediately. Fastest and cheapest; ideal for simple, direct tasks.
* **On** — the model first works through the problem in a dedicated reasoning pass before replying. Slower and uses more tokens, but markedly more accurate on multi-step, logical, or ambiguous tasks.

### Effort — *how hard* it thinks

Only effort-capable models expose this; it applies once thinking is on.

* **High** — standard reasoning depth. The right default for most agentic work.
* **Max** — the model reasons longer and deeper for the hardest problems. More tokens and latency in exchange for higher quality on complex work.

### Button states

| State | Colour | Meaning                         |
| ----- | ------ | ------------------------------- |
| Off   | gray   | Thinking off — direct answer    |
| On    | blue   | Thinking on — no effort control |
| High  | purple | Thinking on, standard effort    |
| Max   | orange | Thinking on, maximum effort     |

Each model shows only the states it genuinely supports. If a model always reasons (can't be turned off) or has no effort control, the button reflects that and locks where there's nothing to change. Wolffish remembers your choice per model.

**On DeepSeek:** Both V4 models support Off / High / Max. In current testing High and Max produce similar depth, but Max is exposed so it benefits automatically if DeepSeek differentiates the tiers later.

***

## Model Selection & Retries

Wolffish communicates with LLMs via nine cloud providers plus a local option, all using pure `fetch()` — no SDKs. Each provider has its own streaming format and tool-calling convention, which `wernicke.ts` normalizes into a single interface.

Select your Brain model explicitly in **Settings → Modes** — the model you choose is the one that runs. There's no cascade or fallback order; if you want a second model for parallel work, turn on [orchestrator mode](/configuration/orchestrator-mode) and assign a Worker model.

When a cloud Brain hits a transient error, `thalamus` retries the **same** model on a backoff schedule (it also uses `net.isOnline()` for instant offline detection). It does not route you to a different provider on failure.