> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wolffi.sh/llms.txt
> Use this file to discover all available pages before exploring further.

# Qwen

> Set up Qwen (Alibaba Cloud) — wide model range from budget flash to frontier reasoning

# Qwen (Alibaba Cloud)

```
POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
```

Uses SSE streaming with OpenAI-compatible tool-calling format. Supports vision (base64 images) and reasoning content.

**Qwen offers one of the widest model ranges of any provider** — from the ultra-cheap Qwen 3.5 Flash at $0.06/$0.24 per MTok to the frontier Qwen 3.7 Max. All Qwen3+ models support three reasoning modes (None, High, Max) and up to 1M context. The dedicated Qwen3 Coder Plus model is tuned for code generation tasks.

Best for: Cost-efficient agentic workflows, code generation, multilingual tasks, and workloads that benefit from a wide selection of price/performance tiers.

<Note>
  Qwen 3.5 Flash is one of the cheapest reasoning-capable models available — at $0.06/$0.24 per MTok with 1M context, it's significantly cheaper than DeepSeek V4 Flash while still supporting full reasoning modes. Great for high-volume tasks where cost matters.
</Note>

## Getting an API Key

1. Go to [qwencloud.com](https://www.qwencloud.com)
2. Sign up or log in
3. Navigate to **API Keys** and create a new key
4. Paste it into Wolffish → Settings → Models → Qwen

## Models

| Model             | Context | Modes          | Input / Output (per MTok) | Cached  | Notes                                |
| ----------------- | ------- | -------------- | ------------------------- | ------- | ------------------------------------ |
| **qwen3.7-max**   | 1M      | Off, High, Max | $2.50 / $7.50             | \$0.25  | Flagship. Frontier reasoning.        |
| qwen3.7-plus      | 1M      | Off, High, Max | $0.40 / $1.60             | \$0.064 | Strong reasoning at mid-range price. |
| qwen3.6-plus      | 1M      | Off, High, Max | $0.40 / $1.60             | \$0.04  | Previous-gen plus.                   |
| qwen3.6-flash     | 1M      | Off, High, Max | $0.25 / $1.50             | \$0.025 | Fast reasoning.                      |
| qwen3.5-plus      | 1M      | Off, High, Max | $0.40 / $1.60             | \$0.04  | Balanced quality and cost.           |
| qwen3.5-flash     | 1M      | Off, High, Max | $0.06 / $0.24             | \$0.006 | Ultra-cheap reasoning.               |
| qwen3-max         | 131K    | Off, High, Max | $1.60 / $6.40             | \$0.40  | Strong reasoning, smaller context.   |
| qwen3-coder-plus  | 131K    | Off, High, Max | $0.40 / $1.60             | \$0.04  | Code-optimized.                      |
| qwen3-coder-flash | 131K    | Off, High, Max | $0.40 / $1.60             | \$0.04  | Fast code-optimized.                 |
| qwq-plus          | 131K    | On             | $0.40 / $1.60             | \$0.04  | Reasoning-only (always thinks).      |
| qvq-max           | 131K    | On             | $1.60 / $6.40             | \$0.16  | Vision reasoning (always thinks).    |
| qwen-max          | 131K    | —              | $1.60 / $6.40             | \$0.16  | Legacy. No reasoning.                |
| qwen-plus         | 131K    | —              | $0.40 / $1.60             | \$0.04  | Legacy. Fast, no reasoning.          |
| qwen-turbo        | 1M      | —              | $0.30 / $0.60             | \$0.03  | Legacy. Fast, no reasoning.          |
| qwen-flash        | 1M      | —              | $0.06 / $0.24             | \$0.006 | Legacy. Ultra-cheap, no reasoning.   |

## Reasoning modes

The **brain icon** next to the message box controls how this model reasons. Click it to cycle through the modes the selected model supports. Two separate ideas combine here:

### Thinking — *whether* the model reasons

* **Off** — the model answers immediately. Fastest and cheapest; ideal for simple, direct tasks.
* **On** — the model first works through the problem in a dedicated reasoning pass before replying. Slower and uses more tokens, but markedly more accurate on multi-step, logical, or ambiguous tasks.

### Effort — *how hard* it thinks

Only effort-capable models expose this; it applies once thinking is on.

* **High** — standard reasoning depth. The right default for most agentic work.
* **Max** — the model reasons longer and deeper for the hardest problems. More tokens and latency in exchange for higher quality on complex work.

### Button states

| State | Colour | Meaning                         |
| ----- | ------ | ------------------------------- |
| Off   | gray   | Thinking off — direct answer    |
| On    | blue   | Thinking on — no effort control |
| High  | purple | Thinking on, standard effort    |
| Max   | orange | Thinking on, maximum effort     |

Each model shows only the states it genuinely supports. If a model always reasons (can't be turned off) or has no effort control, the button reflects that and locks where there's nothing to change. Wolffish remembers your choice per model.

**On Qwen:** qwen3.x models support Off / High / Max (effort via a thinking-token budget). qwq and qvq reason always-on (locked on). Legacy qwen-max/plus/turbo/flash don't reason.

## Cost Comparison

Qwen spans a wide price range, competing at every tier:

| Tier        | Model         | Input / Output (per MTok) | Comparable To                   |
| ----------- | ------------- | ------------------------- | ------------------------------- |
| Ultra-cheap | qwen3.5-flash | $0.06 / $0.24             | Cheaper than DeepSeek V4 Flash  |
| Budget      | qwen3.7-plus  | $0.40 / $1.60             | MiMo V2.5 Pro range             |
| Mid-range   | qwen3.7-max   | $2.50 / $7.50             | Between Kimi K2.6 and Anthropic |
| Legacy      | qwen-max      | $1.60 / $6.40             | MiniMax M3 range                |

<Tip>
  Start with Qwen 3.5 Flash for high-volume tasks, Qwen 3.7 Plus for general agentic work, or Qwen 3.7 Max when you need frontier reasoning. The dedicated Qwen3 Coder Plus model is a good pick for code-heavy workflows at a budget price.
</Tip>
