Skip to main content

Z.ai (Zhipu GLM)

POST https://api.z.ai/api/paas/v4/chat/completions
OpenAI-compatible SSE streaming with tool-calling. Reasoning streams as reasoning_content, thinking is controlled with thinking: { type } (plus reasoning_effort on GLM-5), and prompt-cache hits are reported under usage.prompt_tokens_details.cached_tokens — the same wire format as Kimi. GLM reasoning is per-model — GLM-5 models expose real effort tiers (Off / High / Max via reasoning_effort), while GLM-4.x are a simple on/off toggle (any mode other than none enables reasoning, none disables it). GLM-5.2 also offers a 1M-token context window, the largest in Z.ai’s lineup. Best for: Cost-efficient agentic work, long-context workflows (GLM-5.2), and vision-capable tasks. GLM models are strong all-rounders for tool chains and code at budget-tier pricing.

Getting an API Key

  1. Go to z.ai
  2. Sign up or log in
  3. Open API Keys and create a new key
  4. Paste it into Wolffish → Settings → Models → Z.ai

Models

ModelContextModesInput / Output (per MTok)Cached
glm-5.21MOff / High / Max1.40/1.40 / 4.40$0.26
glm-5.1200KOff / High / Max1.40/1.40 / 4.40$0.26
glm-5-turbo200KOff / High / Max1.20/1.20 / 4.00$0.24
glm-5200KOff / High / Max1.00/1.00 / 3.20$0.20
glm-4.7200KOff / On0.60/0.60 / 2.20$0.11
glm-4.6200KOff / On0.60/0.60 / 2.20$0.11
glm-4.5128KOff / On0.60/0.60 / 2.20$0.11
glm-4.5-air128KOff / On0.20/0.20 / 1.10$0.03

Reasoning modes

The brain icon next to the message box controls how this model reasons. Click it to cycle through the modes the selected model supports. Two separate ideas combine here:

Thinking — whether the model reasons

  • Off — the model answers immediately. Fastest and cheapest; ideal for simple, direct tasks.
  • On — the model first works through the problem in a dedicated reasoning pass before replying. Slower and uses more tokens, but markedly more accurate on multi-step, logical, or ambiguous tasks.

Effort — how hard it thinks

Only effort-capable models expose this; it applies once thinking is on.
  • High — standard reasoning depth. The right default for most agentic work.
  • Max — the model reasons longer and deeper for the hardest problems. More tokens and latency in exchange for higher quality on complex work.

Button states

StateColourMeaning
OffgrayThinking off — direct answer
OnblueThinking on — no effort control
HighpurpleThinking on, standard effort
MaxorangeThinking on, maximum effort
Each model shows only the states it genuinely supports. If a model always reasons (can’t be turned off) or has no effort control, the button reflects that and locks where there’s nothing to change. Wolffish remembers your choice per model. On Z.ai: GLM-5 models support Off / High / Max (genuine effort tiers). GLM-4.x are a simple On / Off toggle with no effort control.