Skip to content

Commit 5e6d1ae

Browse files
authored
Adding: Quwen3 model card (#19)
1 parent 2610171 commit 5e6d1ae

File tree

1 file changed

+110
-0
lines changed

1 file changed

+110
-0
lines changed

ai/qwen3.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Qwen3
2+
3+
![logo](https://github.com/docker/model-cards/raw/refs/heads/main/logos/[email protected])
4+
5+
Qwen3 is the latest generation in the Qwen LLM family, designed for top-tier performance in coding, math, reasoning, and language tasks. It includes both dense and Mixture-of-Experts (MoE) models, offering flexible deployment from lightweight apps to large-scale research.
6+
7+
Qwen3 introduces dual reasoning modes—"thinking" for complex tasks and "non-thinking" for fast responses—giving users dynamic control over performance. It outperforms prior models in reasoning, instruction following, and code generation, while excelling in creative writing and dialogue.
8+
9+
With strong agentic and tool-use capabilities and support for over 100 languages, Qwen3 is optimized for multilingual, multi-domain applications.
10+
11+
---
12+
13+
## 📌 Characteristics
14+
15+
| Attribute | Value |
16+
|-----------------------|-------------------|
17+
| **Provider** | Alibaba Cloud |
18+
| **Architecture** | qwen3 |
19+
| **Cutoff date** | April 2025 (est.) |
20+
| **Languages** | 119 languages from multiple families (Indo European, Sino-Tibetan, Afro-Asiatic, Austronesian, Dravidian, Turkic, Tai-Kadai, Uralic, Astroasiatic) including others like Japanese, Basque, Haitian,... |
21+
| **Tool calling** ||
22+
| **Input modalities** | Text |
23+
| **Output modalities** | Text |
24+
| **License** | Apache 2.0 |
25+
26+
---
27+
28+
29+
## 📦 Available Model Variants
30+
31+
| Model Variant | Parameters | Quantization | Context Length | VRAM | Size |
32+
|---------------------------------------------|------------|--------------------|----------------|----------|---------|
33+
| `ai/qwen3:8B-F16` | 8.19B | F16 | 40,960 tokens | ~16GB¹ | 15.26GB |
34+
| `ai/qwen3:8B-Q4_0` | 8.19B | Q4_0 | 40,960 tokens | ~4.5GB¹ | 4.44GB |
35+
| `ai/qwen3:8B-Q4_K_M` <br> `ai/qwen3:latest` | 8.19B | IQ2_XXS / Q4_K_M | 40,960 tokens | ~4.7GB¹ | 4.68GB |
36+
37+
¹: Estimated VRAM requirements. Actual usage may vary depending on system configuration and inference backend.
38+
39+
> `:latest``8B-Q4_K_M`
40+
41+
---
42+
43+
## 🧠 Intended uses
44+
45+
Qwen3-8B is designed for a wide range of advanced natural language processing tasks:
46+
47+
- Supports both **Dense and Mixture-of-Experts (MoE)** model architectures, available in sizes including 0.6B, 1.7B, 4B, 8B, 14B, 32B, and large MoE variants like 30B-A3B and 235B-A22B.
48+
- Enables **seamless switching between thinking and non-thinking modes**:
49+
- *Thinking mode*: optimized for complex logical reasoning, math, and code generation.
50+
- *Non-thinking mode*: tuned for efficient, general-purpose dialogue and chat.
51+
- Offers **significant improvements in reasoning performance**, outperforming previous QwQ (in thinking mode) and Qwen2.5-Instruct (in non-thinking mode) models on mathematics, code generation, and commonsense reasoning benchmarks.
52+
- Delivers **superior human alignment** and excels at: Creative writing, Role-playing, Multi-turn dialogue, Instruction following with immersive conversations.
53+
- Provides strong **agent capabilities**, including: Integration with external tools and best-in-class performance in complex agent-based workflows across both thinking and unthinking modes.
54+
- Offers support for **100+ languages and dialects**, with robust multilingual instruction following and translation abilities.
55+
56+
---
57+
58+
## Considerations
59+
60+
- **Thinking Mode Switching**
61+
Qwen3 supports a soft switch mechanism via `/think` and `/no_think` prompts (when `enable_thinking=True`). This allows dynamic control over the model's reasoning depth during multi-turn conversations.
62+
- **Tool Calling with Qwen-Agent**
63+
For agentic tasks, use **Qwen-Agent**, which simplifies integration of external tools through built-in templates and parsers, minimizing the need for manual tool-call handling.
64+
> **Note:** Qwen3 models use a new naming convention: post-trained models no longer include the `-Instruct` suffix (e.g., `Qwen3-32B` replaces `Qwen2.5-32B-Instruct`), and base models now end with `-Base`.
65+
66+
---
67+
68+
## 🐳 Using this model with Docker Model Runner
69+
70+
First, pull the model:
71+
72+
```bash
73+
docker model pull ai/qwen3
74+
```
75+
76+
Then run the model:
77+
78+
```bash
79+
docker model run ai/qwen3
80+
```
81+
82+
For more information, check out the [Docker Model Runner docs](https://docs.docker.com/desktop/features/model-runner/).
83+
84+
---
85+
86+
## Benchmarks
87+
88+
| Category | Benchmark | Qwen3 |
89+
|-----------------------------|------------|-------|
90+
| General Tasks | MMLU | 87.81 |
91+
| | MMLU-Redux | 87.40 |
92+
| | MMLU-Pro | 68.18 |
93+
| | SuperGPQA | 44.06 |
94+
| | BBH | 88.87 |
95+
| Mathematics & Science Tasks | GPQA | 47.47 |
96+
| | GSM8K | 94.39 |
97+
| | MATH | 71.84 |
98+
| Multilingual Tasks | MGSM | 83.53 |
99+
| | MMMLU | 86.70 |
100+
| | INCLUDE | 73.46 |
101+
| Code Tasks | EvalPlus | 77.60 |
102+
| | MultiPL-E | 65.94 |
103+
| | MBPP | 81.40 |
104+
| | CRUX-O | 79.00 |
105+
106+
---
107+
108+
## 🔗 Links
109+
110+
- [Qwen3: Think Deeper, Act Faster](https://qwenlm.github.io/blog/qwen3/)

0 commit comments

Comments
 (0)