Providers
Wolffish communicates with LLMs via nine native cloud providers, an aggregator (OpenRouter), and a local option (Ollama), all using pure fetch() — no SDKs. Each provider has its own streaming format and tool-calling convention, which wernicke.ts normalizes into a single interface. All cloud providers support tool calling with no hard tool-count limit.
Choosing a Provider
All supported cloud providers can handle agentic tasks — including complex multi-step tool chains. The difference is cost vs. ceiling.
Cost-efficient tier — DeepSeek, MiMo, Qwen, Kimi, MiniMax, and Stepfun handle complex agentic tasks well — including long multi-step tool chains, research workflows, code generation, and autonomous automations. They should be your default. At 5–25× cheaper than the premium tier, the savings compound fast. Start here and only upgrade if you find execution isn’t reliable enough for a specific workflow.
Mid-range tier — xAI sits between budget and premium, offering Grok models with strong reasoning, vision, and code generation at moderate pricing.
Premium tier — Anthropic and OpenAI deliver the strongest raw model capability. Claude Opus 4.8 and GPT-5.5 excel where the cost-efficient tier falls short — particularly computer-use (screen interaction), which only Anthropic supports, and edge cases where execution reliability on the cheaper models isn’t sufficient.
| Tier | Provider | Flagship Model | Input / Output (per MTok) | Best For |
|---|
| Cost-efficient | DeepSeek | deepseek-v4-pro | 0.44/0.87 | Default for most agentic tasks |
| Cost-efficient | MiMo | mimo-v2.5-pro | 0.20/2.00 | Cheapest option, multilingual |
| Cost-efficient | Qwen | qwen3.7-max | 2.50/7.50 | Wide model range, ultra-cheap flash |
| Cost-efficient | Kimi | kimi-k2.6 | 0.95/4.00 | Strong reasoning, long context |
| Cost-efficient | MiniMax | MiniMax-M3 | 0.30/1.20 | Reasoning and code |
| Cost-efficient | Stepfun | step-3.7-flash | 0.83/6.94 | Always-on reasoning |
| Cost-efficient | Z.ai | glm-4.6 | 0.60/2.20 | GLM models, 1M-context flagship |
| Mid-range | xAI | grok-4.3 | 1.25/2.50 | Reasoning, vision, code |
| Premium | Anthropic | claude-opus-4-8 | 5.00/25.00 | Hardest tasks, computer-use |
| Premium | OpenAI | gpt-5.5 | 5.00/30.00 | Hardest tasks, broad knowledge |
| Local | Ollama | varies | Free | Privacy, offline fallback |
When to reach for the premium tier
- Computer-use / screen interaction — only Anthropic supports this; no alternative
- Execution not reliable enough — if you’ve tried a task on DeepSeek or MiMo and the agent keeps failing or producing poor results, upgrade to Anthropic or OpenAI for that specific workflow
Our recommendation
Start with DeepSeek or MiMo. They handle complex agentic tasks — long tool chains, research pipelines, code generation, autonomous automations — at a fraction of the cost. Experiment with your actual workflows. If a specific task isn’t executing reliably, switch to Anthropic or OpenAI for that task. Most users find they rarely need to.
Select DeepSeek or MiMo as your Brain model in Settings → Modes. If a task isn’t executing reliably on the cost-efficient tier, switch your Brain to Anthropic or OpenAI for that work. There’s no automatic fallback — you control which model runs by your explicit choice.
DeepSeek (Recommended)
POST https://api.deepseek.com/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format.
DeepSeek V4 Pro is Wolffish’s recommended default for agentic tasks. Following the permanent 75% price cut (May 2026), it delivers frontier-class reasoning and tool-use reliability at 29–34× less than competing frontier models on output-heavy workloads — while matching or exceeding their agentic performance on multi-step tool chains. It’s also MIT-licensed, so you can self-host for $0 in API fees if you have the infra.
Best for: Agentic multi-step workflows, tool calling, research chains, cost-efficient daily automations.
If you’re setting up Wolffish for the first time and want one provider that does it all — reliable tool use, strong reasoning, fast responses, minimal cost — start with DeepSeek V4 Pro. You can always add Anthropic or OpenAI later for specific use cases.
Getting an API Key
- Go to platform.deepseek.com
- Sign up or log in
- Navigate to API Keys and create a new key
- Paste it into Wolffish → Settings → Models → DeepSeek
Models
| Model | Context | Max Output | Input / Output (per MTok) | Notes |
|---|
| deepseek-v4-pro | 1M | 32K | 0.435/0.87 | Recommended default. Frontier agentic performance. Cached: $0.003625/MTok. |
| deepseek-v4-flash | 1M | 32K | 0.14/0.28 | Fast and cheap. Cached: $0.0028/MTok. |
Xiaomi MiMo
POST https://api.xiaomimimo.com/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision (base64 images) and PDF document input.
Following Xiaomi’s permanent price cut of up to 99% — matching DeepSeek V4 Pro rates — MiMo V2.5 is one of the cheapest cloud providers available.
Best for: Cost-efficient agentic workflows, multilingual tasks, multi-step tool chains, high-volume automations.
Getting an API Key
- Go to platform.xiaomimimo.com
- Sign up or log in
- Navigate to API Keys and create a new key
- Paste it into Wolffish → Settings → Models → Xiaomi Mimo
Models
| Model | Context | Max Output | Input / Output (per MTok) | Notes |
|---|
| mimo-v2.5-pro | 1M | 64K | 0.20/2.00 | Best MiMo model. Strong reasoning and multilingual. |
| mimo-v2.5 | 1M | 32K | 0.08/0.80 | Good balance of quality and cost. |
| mimo-v2-flash | 256K | 16K | 0.01/0.30 | Ultra-cheap for high-volume tasks. |
Kimi (Moonshot AI)
POST https://api.moonshot.ai/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision and reasoning content.
Best for: Agentic workflows, long-context tasks, reasoning-heavy workloads.
Getting an API Key
- Go to platform.moonshot.ai
- Sign up or log in
- Navigate to API Keys and create a new key
- Paste it into Wolffish → Settings → Models → Kimi
Models
| Model | Context | Max Output | Input / Output (per MTok) | Notes |
|---|
| kimi-k2.6 | 256K | 64K | 0.95/4.00 | Latest flagship. Strong reasoning. |
| kimi-k2.5 | 256K | 64K | 0.60/3.00 | Good balance of cost and quality. |
| moonshot-v1-128k | 128K | 16K | 2.00/5.00 | Long-context. |
| moonshot-v1-32k | 32K | 8K | 1.00/3.00 | Mid-context. |
MiniMax
POST https://api.minimaxi.chat/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports reasoning content.
Best for: Reasoning-heavy workloads, code generation, agentic workflows. DeepSeek and MiMo remain cheaper and more capable for most workloads.
Getting an API Key
- Go to platform.minimaxi.chat
- Sign up or log in
- Navigate to API Keys and create a new key
- Paste it into Wolffish → Settings → Models → MiniMax
Models
| Model | Context | Max Output | Input / Output (per MTok) | Notes |
|---|
| MiniMax-M3 | 1M | 64K | 0.30/1.20 | Latest flagship. Strong reasoning. |
| MiniMax-M2.7 | 200K | 32K | 0.30/1.20 | Previous gen. |
| MiniMax-M2.5 | 200K | 32K | 0.30/1.20 | Balanced quality and cost. |
Qwen (Alibaba Cloud)
POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision and reasoning content.
Best for: Cost-efficient agentic workflows, code generation, multilingual tasks.
Getting an API Key
- Go to qwencloud.com
- Sign up or log in
- Navigate to API Keys and create a new key
- Paste it into Wolffish → Settings → Models → Qwen
Models
| Model | Context | Max Output | Modes | Input / Output (per MTok) | Notes |
|---|
| qwen3.7-max | 1M | 64K | None, High, Max | 2.50/7.50 | Flagship. Frontier reasoning. |
| qwen3.7-plus | 1M | 64K | None, High, Max | 0.40/1.60 | Strong reasoning, mid-range price. |
| qwen3.5-flash | 1M | 64K | None, High, Max | 0.06/0.24 | Ultra-cheap reasoning. |
| qwen3-coder-plus | 131K | 32K | None, High, Max | 0.40/1.60 | Code-optimized. |
Stepfun
POST https://api.stepfun.ai/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision and reasoning content.
Best for: Reasoning-heavy tasks where you always want the model to think.
Getting an API Key
- Go to platform.stepfun.ai
- Sign up or log in
- Navigate to API Keys and create a new key
- Paste it into Wolffish → Settings → Models → Stepfun
Models
| Model | Context | Max Output | Modes | Input / Output (per MTok) | Notes |
|---|
| step-3.7-flash | 128K | 32K | Always-on | 0.83/6.94 | Latest. Frontier reasoning. |
| step-3.5-flash | 128K | 32K | Always-on | 0.83/6.94 | Fast reasoning. |
Z.ai (Zhipu GLM)
POST https://api.z.ai/api/paas/v4/chat/completions
OpenAI-compatible SSE streaming with tool-calling — the same wire format as Kimi. GLM thinking is binary (on/off, no effort levels), and GLM-5.2 offers a 1M-token context window.
Best for: Cost-efficient agentic work and long-context workflows.
Getting an API Key
- Go to z.ai
- Sign up or log in
- Open API Keys and create a new key
- Paste it into Wolffish → Settings → Models → Z.ai
Models
| Model | Context | Max Output | Modes | Input / Output (per MTok) | Notes |
|---|
| glm-4.6 | 200K | 64K | Thinking on/off | 0.60/2.20 | Recommended. Cost-efficient workhorse. |
| glm-4.5-air | 128K | 64K | Thinking on/off | 0.20/1.10 | Cheapest. |
| glm-5.2 | 1M | 64K | Thinking on/off | 1.40/4.40 | Flagship. Largest context. |
For the full GLM lineup and per-model details, see the Z.ai page.
Anthropic (Claude)
POST https://api.anthropic.com/v1/messages
Uses SSE streaming. Tool calls arrive as tool_use content blocks.
Best for: Complex reasoning, detailed instruction following, nuanced tool use, computer-use (screen interaction).
Getting an API Key
- Go to console.anthropic.com
- Sign up or log in
- Navigate to API Keys and create a new key
- Paste it into Wolffish → Settings → Models → Anthropic
Models
| Model | Context | Max Output | Modes | Input / Output (per MTok) | Notes |
|---|
| claude-opus-4-8 | 1M | 32K | None, High, Max | 5.00/25.00 | Latest. Frontier reasoning. |
| claude-sonnet-4-6 | 1M | 64K | None, High, Max | 3.00/15.00 | Best balance of quality and cost. |
| claude-haiku-4-5 | 200K | 8K | None, High | 1.00/5.00 | Fast and cheap. Not recommended for agentic tasks. |
Anthropic is the only provider that supports computer-use (screen interaction). If you need Wolffish to drive a browser or desktop UI, you need an Anthropic key.
xAI (Grok)
POST https://api.x.ai/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision and reasoning content.
Best for: Reasoning-heavy workflows, code generation, vision tasks.
Getting an API Key
- Go to console.x.ai
- Sign up or log in
- Navigate to API Keys and create a new key
- Paste it into Wolffish → Settings → Models → xAI
Models
| Model | Context | Max Output | Modes | Input / Output (per MTok) | Notes |
|---|
| grok-4.3 | 1M | 64K | None, High, Max | 1.25/2.50 | Flagship. Vision + reasoning. |
| grok-build-0.1 | 256K | 32K | None, High, Max | 1.00/2.00 | Code-optimized. |
| grok-3-mini | 131K | 32K | None, High | 0.30/0.50 | Fast and cheap. |
OpenAI (GPT)
POST https://api.openai.com/v1/chat/completions
Uses SSE streaming. Tool calls arrive as function_call objects.
Best for: General-purpose tasks, broad knowledge, fast responses.
Getting an API Key
- Go to platform.openai.com
- Sign up or log in
- Navigate to API Keys and create a new key
- Paste it into Wolffish → Settings → Models → OpenAI
Models
| Model | Context | Max Output | Modes | Input / Output (per MTok) | Notes |
|---|
| gpt-5.5 | 1M | 64K | None, High, Max | 5.00/30.00 | Flagship. Strong reasoning. |
| gpt-5.4-mini | 1M | 64K | None, High, Max | 0.75/4.50 | Fast reasoning. |
| gpt-5.4-nano | 1M | 64K | None, High, Max | 0.20/1.25 | Ultra-cheap reasoning. |
OpenRouter (Aggregator)
POST https://openrouter.ai/api/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Routes requests to any model from any provider through a single API key.
OpenRouter is a model aggregator — a single API endpoint that proxies requests to Anthropic, OpenAI, DeepSeek, Qwen, xAI, Meta, Mistral, Google, and dozens more. One key, one billing account, access to everything.
We recommend configuring providers directly whenever possible. Direct integration gives you lower latency (no proxy hop), accurate cost tracking, provider-specific features (Anthropic’s ephemeral caching, DeepSeek’s FIM), and no middleman markup. OpenRouter adds a routing layer that can introduce latency and occasionally inconsistent behavior across providers.Use OpenRouter when you want to experiment with models you haven’t set up directly, or as a convenient fallback for providers where you don’t want to manage a separate API key.
Getting an API Key
- Go to openrouter.ai
- Sign up or log in
- Navigate to Keys and create a new key
- Paste it into Wolffish → Settings → Models → OpenRouter
Supported Models
OpenRouter supports hundreds of models. Wolffish normalizes output caps to match each provider’s native limits:
| Model (via OpenRouter) | Max Output | Notes |
|---|
| anthropic/claude-* | 32K | Matches native Anthropic caps. |
| openai/gpt-5* | 64K | Matches native OpenAI caps. |
| openai/o3, openai/o4 | 64K | Reasoning models. |
| deepseek/* | 32K | Matches native DeepSeek caps. |
| x-ai/grok-* | 32K | Matches native xAI caps. |
| google/gemini-* | 64K | Google Gemini models. |
| meta-llama/* | 16K | Meta Llama models. |
| qwen/* | 32K | Matches native Qwen caps. |
| mistralai/* | 32K | Mistral models. |
When to Use OpenRouter
Good fit:
- Trying models from providers you haven’t configured yet
- Quick A/B testing across different model families
- Unified billing when you only want one API bill
- Accessing niche or newer models not yet natively supported
Use direct integration instead when:
- The provider is already natively supported (DeepSeek, Anthropic, OpenAI, etc.)
- You need the lowest possible latency
- You want provider-specific features (caching, prompt prefixes, etc.)
- You’re running high-volume production workloads where the proxy hop adds up
If you’re already using DeepSeek, Anthropic, or any other natively supported provider, keep that direct connection. Add OpenRouter only for models you can’t access directly — then pick an OpenRouter model as your Brain (or Worker) when you want it. There’s no cascade; the model you select is the one that runs.
Ollama (Local)
POST http://localhost:11434/api/chat
Uses NDJSON streaming. Tool calls arrive as structured JSON in the response. No API key needed — runs entirely on your machine. See the Ollama integration guide for model requirements and hardware recommendations.
Best for: Privacy, offline use, zero-cost experimentation, always-available fallback.
Retries & Health
The selected Brain model runs every turn — there’s no fallback to other providers. When a cloud Brain hits a transient error, thalamus retries the same model on a backoff schedule (it also checks net.isOnline() for instant offline detection). Workers in orchestrator mode are single-shot and don’t retry on their own; any failure surfaces to the orchestrator, which owns the retry decision. Health tracking informs this retry logic and diagnostics — it does not route you to a different provider.
Choosing Your Brain Model
Select your Brain model explicitly in Settings → Modes — one provider, one model. The model you choose is the one that runs; there’s no fallback order and no “primary” with a chain behind it. Connect API keys in Settings → Models, then pick the model that powers Wolffish.
Want a second model for parallel work? Turn on orchestrator mode and assign a Worker model alongside your Brain. There’s still no automatic cascade — both are explicit choices.
All providers are optional — you only need the one (or two) you select. To run on Ollama, select it as your Brain model; for offline work, switch your Brain to Ollama before you go offline, since there’s no automatic fall-through to local.