Skip to main content

Providers

Wolffish communicates with LLMs via nine native cloud providers, an aggregator (OpenRouter), and a local option (Ollama), all using pure fetch() — no SDKs. Each provider has its own streaming format and tool-calling convention, which wernicke.ts normalizes into a single interface. All cloud providers support tool calling with no hard tool-count limit.

Choosing a Provider

All supported cloud providers can handle agentic tasks — including complex multi-step tool chains. The difference is cost vs. ceiling. Cost-efficient tier — DeepSeek, MiMo, Qwen, Kimi, MiniMax, and Stepfun handle complex agentic tasks well — including long multi-step tool chains, research workflows, code generation, and autonomous automations. They should be your default. At 5–25× cheaper than the premium tier, the savings compound fast. Start here and only upgrade if you find execution isn’t reliable enough for a specific workflow. Mid-range tier — xAI sits between budget and premium, offering Grok models with strong reasoning, vision, and code generation at moderate pricing. Premium tier — Anthropic and OpenAI deliver the strongest raw model capability. Claude Opus 4.8 and GPT-5.5 excel where the cost-efficient tier falls short — particularly computer-use (screen interaction), which only Anthropic supports, and edge cases where execution reliability on the cheaper models isn’t sufficient.
TierProviderFlagship ModelInput / Output (per MTok)Best For
Cost-efficientDeepSeekdeepseek-v4-pro0.44/0.44 / 0.87Default for most agentic tasks
Cost-efficientMiMomimo-v2.5-pro0.20/0.20 / 2.00Cheapest option, multilingual
Cost-efficientQwenqwen3.7-max2.50/2.50 / 7.50Wide model range, ultra-cheap flash
Cost-efficientKimikimi-k2.60.95/0.95 / 4.00Strong reasoning, long context
Cost-efficientMiniMaxMiniMax-M30.30/0.30 / 1.20Reasoning and code
Cost-efficientStepfunstep-3.7-flash0.83/0.83 / 6.94Always-on reasoning
Cost-efficientZ.aiglm-4.60.60/0.60 / 2.20GLM models, 1M-context flagship
Mid-rangexAIgrok-4.31.25/1.25 / 2.50Reasoning, vision, code
PremiumAnthropicclaude-opus-4-85.00/5.00 / 25.00Hardest tasks, computer-use
PremiumOpenAIgpt-5.55.00/5.00 / 30.00Hardest tasks, broad knowledge
LocalOllamavariesFreePrivacy, offline fallback

When to reach for the premium tier

  • Computer-use / screen interaction — only Anthropic supports this; no alternative
  • Execution not reliable enough — if you’ve tried a task on DeepSeek or MiMo and the agent keeps failing or producing poor results, upgrade to Anthropic or OpenAI for that specific workflow

Our recommendation

Start with DeepSeek or MiMo. They handle complex agentic tasks — long tool chains, research pipelines, code generation, autonomous automations — at a fraction of the cost. Experiment with your actual workflows. If a specific task isn’t executing reliably, switch to Anthropic or OpenAI for that task. Most users find they rarely need to.
Select DeepSeek or MiMo as your Brain model in Settings → Modes. If a task isn’t executing reliably on the cost-efficient tier, switch your Brain to Anthropic or OpenAI for that work. There’s no automatic fallback — you control which model runs by your explicit choice.
POST https://api.deepseek.com/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. DeepSeek V4 Pro is Wolffish’s recommended default for agentic tasks. Following the permanent 75% price cut (May 2026), it delivers frontier-class reasoning and tool-use reliability at 29–34× less than competing frontier models on output-heavy workloads — while matching or exceeding their agentic performance on multi-step tool chains. It’s also MIT-licensed, so you can self-host for $0 in API fees if you have the infra. Best for: Agentic multi-step workflows, tool calling, research chains, cost-efficient daily automations.
If you’re setting up Wolffish for the first time and want one provider that does it all — reliable tool use, strong reasoning, fast responses, minimal cost — start with DeepSeek V4 Pro. You can always add Anthropic or OpenAI later for specific use cases.

Getting an API Key

  1. Go to platform.deepseek.com
  2. Sign up or log in
  3. Navigate to API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → DeepSeek

Models

ModelContextMax OutputInput / Output (per MTok)Notes
deepseek-v4-pro1M32K0.435/0.435 / 0.87Recommended default. Frontier agentic performance. Cached: $0.003625/MTok.
deepseek-v4-flash1M32K0.14/0.14 / 0.28Fast and cheap. Cached: $0.0028/MTok.

Xiaomi MiMo

POST https://api.xiaomimimo.com/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision (base64 images) and PDF document input. Following Xiaomi’s permanent price cut of up to 99% — matching DeepSeek V4 Pro rates — MiMo V2.5 is one of the cheapest cloud providers available. Best for: Cost-efficient agentic workflows, multilingual tasks, multi-step tool chains, high-volume automations.

Getting an API Key

  1. Go to platform.xiaomimimo.com
  2. Sign up or log in
  3. Navigate to API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → Xiaomi Mimo

Models

ModelContextMax OutputInput / Output (per MTok)Notes
mimo-v2.5-pro1M64K0.20/0.20 / 2.00Best MiMo model. Strong reasoning and multilingual.
mimo-v2.51M32K0.08/0.08 / 0.80Good balance of quality and cost.
mimo-v2-flash256K16K0.01/0.01 / 0.30Ultra-cheap for high-volume tasks.

Kimi (Moonshot AI)

POST https://api.moonshot.ai/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision and reasoning content. Best for: Agentic workflows, long-context tasks, reasoning-heavy workloads.

Getting an API Key

  1. Go to platform.moonshot.ai
  2. Sign up or log in
  3. Navigate to API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → Kimi

Models

ModelContextMax OutputInput / Output (per MTok)Notes
kimi-k2.6256K64K0.95/0.95 / 4.00Latest flagship. Strong reasoning.
kimi-k2.5256K64K0.60/0.60 / 3.00Good balance of cost and quality.
moonshot-v1-128k128K16K2.00/2.00 / 5.00Long-context.
moonshot-v1-32k32K8K1.00/1.00 / 3.00Mid-context.

MiniMax

POST https://api.minimaxi.chat/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports reasoning content. Best for: Reasoning-heavy workloads, code generation, agentic workflows. DeepSeek and MiMo remain cheaper and more capable for most workloads.

Getting an API Key

  1. Go to platform.minimaxi.chat
  2. Sign up or log in
  3. Navigate to API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → MiniMax

Models

ModelContextMax OutputInput / Output (per MTok)Notes
MiniMax-M31M64K0.30/0.30 / 1.20Latest flagship. Strong reasoning.
MiniMax-M2.7200K32K0.30/0.30 / 1.20Previous gen.
MiniMax-M2.5200K32K0.30/0.30 / 1.20Balanced quality and cost.

Qwen (Alibaba Cloud)

POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision and reasoning content. Best for: Cost-efficient agentic workflows, code generation, multilingual tasks.

Getting an API Key

  1. Go to qwencloud.com
  2. Sign up or log in
  3. Navigate to API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → Qwen

Models

ModelContextMax OutputModesInput / Output (per MTok)Notes
qwen3.7-max1M64KNone, High, Max2.50/2.50 / 7.50Flagship. Frontier reasoning.
qwen3.7-plus1M64KNone, High, Max0.40/0.40 / 1.60Strong reasoning, mid-range price.
qwen3.5-flash1M64KNone, High, Max0.06/0.06 / 0.24Ultra-cheap reasoning.
qwen3-coder-plus131K32KNone, High, Max0.40/0.40 / 1.60Code-optimized.

Stepfun

POST https://api.stepfun.ai/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision and reasoning content. Best for: Reasoning-heavy tasks where you always want the model to think.

Getting an API Key

  1. Go to platform.stepfun.ai
  2. Sign up or log in
  3. Navigate to API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → Stepfun

Models

ModelContextMax OutputModesInput / Output (per MTok)Notes
step-3.7-flash128K32KAlways-on0.83/0.83 / 6.94Latest. Frontier reasoning.
step-3.5-flash128K32KAlways-on0.83/0.83 / 6.94Fast reasoning.

Z.ai (Zhipu GLM)

POST https://api.z.ai/api/paas/v4/chat/completions
OpenAI-compatible SSE streaming with tool-calling — the same wire format as Kimi. GLM thinking is binary (on/off, no effort levels), and GLM-5.2 offers a 1M-token context window. Best for: Cost-efficient agentic work and long-context workflows.

Getting an API Key

  1. Go to z.ai
  2. Sign up or log in
  3. Open API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → Z.ai

Models

ModelContextMax OutputModesInput / Output (per MTok)Notes
glm-4.6200K64KThinking on/off0.60/0.60 / 2.20Recommended. Cost-efficient workhorse.
glm-4.5-air128K64KThinking on/off0.20/0.20 / 1.10Cheapest.
glm-5.21M64KThinking on/off1.40/1.40 / 4.40Flagship. Largest context.
For the full GLM lineup and per-model details, see the Z.ai page.

Anthropic (Claude)

POST https://api.anthropic.com/v1/messages
Uses SSE streaming. Tool calls arrive as tool_use content blocks. Best for: Complex reasoning, detailed instruction following, nuanced tool use, computer-use (screen interaction).

Getting an API Key

  1. Go to console.anthropic.com
  2. Sign up or log in
  3. Navigate to API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → Anthropic

Models

ModelContextMax OutputModesInput / Output (per MTok)Notes
claude-opus-4-81M32KNone, High, Max5.00/5.00 / 25.00Latest. Frontier reasoning.
claude-sonnet-4-61M64KNone, High, Max3.00/3.00 / 15.00Best balance of quality and cost.
claude-haiku-4-5200K8KNone, High1.00/1.00 / 5.00Fast and cheap. Not recommended for agentic tasks.
Anthropic is the only provider that supports computer-use (screen interaction). If you need Wolffish to drive a browser or desktop UI, you need an Anthropic key.

xAI (Grok)

POST https://api.x.ai/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision and reasoning content. Best for: Reasoning-heavy workflows, code generation, vision tasks.

Getting an API Key

  1. Go to console.x.ai
  2. Sign up or log in
  3. Navigate to API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → xAI

Models

ModelContextMax OutputModesInput / Output (per MTok)Notes
grok-4.31M64KNone, High, Max1.25/1.25 / 2.50Flagship. Vision + reasoning.
grok-build-0.1256K32KNone, High, Max1.00/1.00 / 2.00Code-optimized.
grok-3-mini131K32KNone, High0.30/0.30 / 0.50Fast and cheap.

OpenAI (GPT)

POST https://api.openai.com/v1/chat/completions
Uses SSE streaming. Tool calls arrive as function_call objects. Best for: General-purpose tasks, broad knowledge, fast responses.

Getting an API Key

  1. Go to platform.openai.com
  2. Sign up or log in
  3. Navigate to API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → OpenAI

Models

ModelContextMax OutputModesInput / Output (per MTok)Notes
gpt-5.51M64KNone, High, Max5.00/5.00 / 30.00Flagship. Strong reasoning.
gpt-5.4-mini1M64KNone, High, Max0.75/0.75 / 4.50Fast reasoning.
gpt-5.4-nano1M64KNone, High, Max0.20/0.20 / 1.25Ultra-cheap reasoning.

OpenRouter (Aggregator)

POST https://openrouter.ai/api/v1/chat/completions
Uses SSE streaming with OpenAI-compatible tool-calling format. Routes requests to any model from any provider through a single API key. OpenRouter is a model aggregator — a single API endpoint that proxies requests to Anthropic, OpenAI, DeepSeek, Qwen, xAI, Meta, Mistral, Google, and dozens more. One key, one billing account, access to everything.
We recommend configuring providers directly whenever possible. Direct integration gives you lower latency (no proxy hop), accurate cost tracking, provider-specific features (Anthropic’s ephemeral caching, DeepSeek’s FIM), and no middleman markup. OpenRouter adds a routing layer that can introduce latency and occasionally inconsistent behavior across providers.Use OpenRouter when you want to experiment with models you haven’t set up directly, or as a convenient fallback for providers where you don’t want to manage a separate API key.

Getting an API Key

  1. Go to openrouter.ai
  2. Sign up or log in
  3. Navigate to Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → OpenRouter

Supported Models

OpenRouter supports hundreds of models. Wolffish normalizes output caps to match each provider’s native limits:
Model (via OpenRouter)Max OutputNotes
anthropic/claude-*32KMatches native Anthropic caps.
openai/gpt-5*64KMatches native OpenAI caps.
openai/o3, openai/o464KReasoning models.
deepseek/*32KMatches native DeepSeek caps.
x-ai/grok-*32KMatches native xAI caps.
google/gemini-*64KGoogle Gemini models.
meta-llama/*16KMeta Llama models.
qwen/*32KMatches native Qwen caps.
mistralai/*32KMistral models.

When to Use OpenRouter

Good fit:
  • Trying models from providers you haven’t configured yet
  • Quick A/B testing across different model families
  • Unified billing when you only want one API bill
  • Accessing niche or newer models not yet natively supported
Use direct integration instead when:
  • The provider is already natively supported (DeepSeek, Anthropic, OpenAI, etc.)
  • You need the lowest possible latency
  • You want provider-specific features (caching, prompt prefixes, etc.)
  • You’re running high-volume production workloads where the proxy hop adds up
If you’re already using DeepSeek, Anthropic, or any other natively supported provider, keep that direct connection. Add OpenRouter only for models you can’t access directly — then pick an OpenRouter model as your Brain (or Worker) when you want it. There’s no cascade; the model you select is the one that runs.

Ollama (Local)

POST http://localhost:11434/api/chat
Uses NDJSON streaming. Tool calls arrive as structured JSON in the response. No API key needed — runs entirely on your machine. See the Ollama integration guide for model requirements and hardware recommendations. Best for: Privacy, offline use, zero-cost experimentation, always-available fallback.

Retries & Health

The selected Brain model runs every turn — there’s no fallback to other providers. When a cloud Brain hits a transient error, thalamus retries the same model on a backoff schedule (it also checks net.isOnline() for instant offline detection). Workers in orchestrator mode are single-shot and don’t retry on their own; any failure surfaces to the orchestrator, which owns the retry decision. Health tracking informs this retry logic and diagnostics — it does not route you to a different provider.

Choosing Your Brain Model

Select your Brain model explicitly in Settings → Modes — one provider, one model. The model you choose is the one that runs; there’s no fallback order and no “primary” with a chain behind it. Connect API keys in Settings → Models, then pick the model that powers Wolffish. Want a second model for parallel work? Turn on orchestrator mode and assign a Worker model alongside your Brain. There’s still no automatic cascade — both are explicit choices. All providers are optional — you only need the one (or two) you select. To run on Ollama, select it as your Brain model; for offline work, switch your Brain to Ollama before you go offline, since there’s no automatic fall-through to local.