Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 42 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,22 @@ Coming soon:
| [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | ✅ | ✅ | ✅ | ✅ | ✅ |

### 投机采样
目前已开源Qwen3系列模型的Eagle3权重。

| 模型名 | Eagle3 |
| ----------| ----------------- |
| [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3) | ✅ |
| [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3) | ✅ |
| [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3) | ✅ |
| [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3) | ✅ |
| [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3) | ✅ |
| [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3) | ✅ |
#### Eagle3
目前已开源Qwen3和Hunyuan系列模型的Eagle3权重。

| Qwen3 Models | Hunyuan Models |
| ----------|----------|
| ✅ [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3) |✅ [Hunyuan-1.8B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-1.8B-Instruct_eagle3) |
| ✅ [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3) |✅ [Hunyuan-4B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-4B-Instruct_eagle3) |
| ✅ [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3) |✅ [Hunyuan-7B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-7B-Instruct_eagle3) |
| ✅ [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3) |
| ✅ [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3) |
| ✅ [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3) |





## 🛎️如何使用

Expand Down Expand Up @@ -279,6 +285,7 @@ Qwen3系列模型的`BF16`、`FP8-Static`、`FP8-Dynamic`、`INT8-Dynamic`、`IN
</table>

### (2)投机采样
#### Qwen3 Series Models
Qwen3系列的Eagle3模型在MT-bench/HunmanEval/GSM8K/Alpaca上的加速结果如下:

<table>
Expand Down Expand Up @@ -312,6 +319,32 @@ Qwen3系列的Eagle3模型在MT-bench/HunmanEval/GSM8K/Alpaca上的加速结果
</tbody>
</table>

Hunyuan系列的Eagle3模型在MT-bench/HunmanEval/GSM8K/Alpaca上的加速结果如下:

<table>
<thead>
<tr>
<th>&nbsp</th><th>&nbsp</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
<tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th></tr>
</thead>
<tbody>
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
<tr><td rowspan="3"><strong>T=0</strong></td>
<td>Hunyuan-1.8B-Instruct</td><td>1.97x</td><td>2.90</td><td>2.58x</td><td>3.73</td><td>2.61x</td><td>3.71</td><td>1.71x</td><td>2.43</td><td>2.22x</td><td>3.19</td></tr>
<tr> <td>Hunyuan-4B-Instruct</td><td>1.77x</td><td>2.60</td><td>2.64x</td><td>3.35</td><td>2.14x</td><td>3.17</td><td>1.72x</td><td>2.57</td><td>2.07x</td><td>2.92</td></tr>
<tr><td>Hunyuan-7B-Instruct</td><td>2.22x</td><td>3.58</td><td>3.59x</td><td>5.47</td><td>2.96x</td><td>4.68</td><td>1.64x</td><td>2.56</td><td>2.60x</td><td>4.07</td></tr>
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
<tr><td rowspan="3"><strong>T=1</strong></td>
<td>Hunyuan-1.8B-Instruct</td><td>1.58x</td><td>2.36</td><td>2.35x</td><td>3.56</td><td>2.23x</td><td>3.38</td><td>1.26x</td><td>1.87</td><td>1.86x</td><td>2.79</td></tr>
<tr><td>Hunyuan-4B-Instruct</td><td>1.36x</td><td>2.05</td><td>1.97x</td><td>2.86</td><td>1.72x</td><td>2.68</td><td>1.14x</td><td>1.76</td><td>1.55x</td><td>2.34</td></tr>
<tr><td>Hunyuan-7B-Instruct</td><td>1.90x</td><td>3.11</td><td>3.12x</td><td>5.09</td><td>2.74x</td><td>4.34</td><td>1.47x</td><td>2.39</td><td>2.31x</td><td>3.73</td></tr>
</tbody>
</table>

## 📝许可协议
本项目的代码依照 [License for AngelSlim](LICENSE) 协议开源。
Expand Down
47 changes: 39 additions & 8 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,18 @@ Currently supports the following LLMs, including Hunyuan-Dense, Hunyuan-MoE, Qwe
| [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | ✅ | ✅ | ✅ | ✅ | ✅ |

### Speculative Decoding

#### Eagle3
The Eagle3 weights for the Qwen3 series model are now available.

| Model | Eagle3 |
| ----------| ----------------- |
| [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3) | |
| [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3) | |
| [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3) | |
| [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3) | ✅ |
| [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3) | ✅ |
| [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3) | ✅ |
| Qwen3 Models | Hunyuan Models |
| ----------|----------|
| [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3) |✅ [Hunyuan-1.8B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-1.8B-Instruct_eagle3) |
| [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3) |✅ [Hunyuan-4B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-4B-Instruct_eagle3) |
| [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3) |✅ [Hunyuan-7B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-7B-Instruct_eagle3) |
| [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3) |
| [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3) |
| [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3) |

## 🛎️How to Use

Expand Down Expand Up @@ -282,6 +284,8 @@ Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`
</table>

### (2) Speculative Decoding

#### Qwen3 Series Models
Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:

<table>
Expand Down Expand Up @@ -315,6 +319,33 @@ Benchmark results for Qwen3 series models with `Eagle3` speculative decoding alg
</tbody>
</table>

#### Hunyuan Series Models
Benchmark results for Hunyuan series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:

<table>
<thead>
<tr>
<th>&nbsp</th><th>&nbsp</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
<tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th></tr>
</thead>
<tbody>
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
<tr><td rowspan="3"><strong>T=0</strong></td>
<td>Hunyuan-1.8B-Instruct</td><td>1.97x</td><td>2.90</td><td>2.58x</td><td>3.73</td><td>2.61x</td><td>3.71</td><td>1.71x</td><td>2.43</td><td>2.22x</td><td>3.19</td></tr>
<tr> <td>Hunyuan-4B-Instruct</td><td>1.77x</td><td>2.60</td><td>2.64x</td><td>3.35</td><td>2.14x</td><td>3.17</td><td>1.72x</td><td>2.57</td><td>2.07x</td><td>2.92</td></tr>
<tr><td>Hunyuan-7B-Instruct</td><td>2.22x</td><td>3.58</td><td>3.59x</td><td>5.47</td><td>2.96x</td><td>4.68</td><td>1.64x</td><td>2.56</td><td>2.60x</td><td>4.07</td></tr>
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
<tr><td rowspan="3"><strong>T=1</strong></td>
<td>Hunyuan-1.8B-Instruct</td><td>1.58x</td><td>2.36</td><td>2.35x</td><td>3.56</td><td>2.23x</td><td>3.38</td><td>1.26x</td><td>1.87</td><td>1.86x</td><td>2.79</td></tr>
<tr><td>Hunyuan-4B-Instruct</td><td>1.36x</td><td>2.05</td><td>1.97x</td><td>2.86</td><td>1.72x</td><td>2.68</td><td>1.14x</td><td>1.76</td><td>1.55x</td><td>2.34</td></tr>
<tr><td>Hunyuan-7B-Instruct</td><td>1.90x</td><td>3.11</td><td>3.12x</td><td>5.09</td><td>2.74x</td><td>4.34</td><td>1.47x</td><td>2.39</td><td>2.31x</td><td>3.73</td></tr>
</tbody>
</table>

## 📝 License

Expand Down
16 changes: 16 additions & 0 deletions docs/source/features/speculative_decoding/eagle.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
其中全部数据都是在单张H20上使用pytorch推理获得。

## 快速测试

### SGLang
目前sglang已经支持Qwen3-8B/14B/30B-A3B模型的eagle3部署,你可以选择使用sglang作为推理后端快速验证Eagle3模型的加速效果。
在已经安装sglang的环境中使用以下命令可以快速启动一个兼容Openai的服务,然后即可以通过本地端口进行请求了。
- 启动兼容OpenAI格式的API服务
Expand All @@ -28,5 +30,19 @@
- `TARGET_MODEL_PATH_OR_NAME`为本地路径或模型在huggingface上的名字;
- `EAGLE3_MODEL_PATH`为Eagle3模型路径或在huggingface上的名字;


### vLLM
目前vllm已经支持Hunyuan-1.8B-Instruct/4B-Instruct/7B-Instruct模型的eagle3部署,你可以选择使用vllm作为推理后端快速验证Eagle3模型的加速效果。
在已经安装正确的[vllm commit](https://github.com/vllm-project/vllm/pull/22080) 的环境中使用以下命令可以快速启动一个兼容Openai的服务,然后即可以通过本地端口进行请求了。
- 启动兼容OpenAI格式的API服务

```shell
python3 -m vllm.entrypoints.openai.api_server --tensor-parallel-size 1 \
--port 8000 \
--speculative_config '{"model": "AngelSlim/Hunyuan-1.8B-Instruct_eagle3", "method" : "eagle3", "draft_tensor_parallel_size" : 1, "num_speculative_tokens": 2}' --trust-remote-code \
--model tencent/Hunyuan-1.8B-Instruct
```
但是由于vllm最新版本Eagle3并不支持tree attention, 因此推理验证时为chain-base推理模式。

## 训练及创新
Comming soon.
29 changes: 29 additions & 0 deletions docs/source/performance/speculative_decoding/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## Eagle3

### Qwen3 Series Models

| | | MT-bench | | HumanEval | | GSM8K | | Alpaca | | Mean | |
|------------------|--------------|------------------|------------|-------------------|-------------|----------------|---------|----------------|----------|---------------|--------|
| | Model | Speedup | τ | Speedup | τ | Speedup | τ | Speedup | τ | Speedup | τ |
Expand All @@ -19,3 +21,30 @@
| | Qwen3-32B | 1.62x | 1.91 | 1.71x | 2.05 | 1.78x | 2.10 | 1.80x | 1.95 | 1.62x | 2.00 |
| | Qwen3-30B-A3B| 1.91x | 2.46 | 2.00x | 2.64 | 1.90x | 2.53 | 1.80x | 2.32 | 1.90x | 2.48 |

### Hunyuan Series Models

<table>
<thead>
<tr>
<th>&nbsp</th><th>&nbsp</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
<th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
<tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th></tr>
</thead>
<tbody>
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
<tr><td rowspan="3"><strong>Temperature=0</strong></td>
<td>Hunyuan-1.8B-Instruct</td><td>1.97x</td><td>2.90</td><td>2.58x</td><td>3.73</td><td>2.61x</td><td>3.71</td><td>1.71x</td><td>2.43</td><td>2.22x</td><td>3.19</td></tr>
<tr> <td>Hunyuan-4B-Instruct</td><td>1.77x</td><td>2.60</td><td>2.64x</td><td>3.35</td><td>2.14x</td><td>3.17</td><td>1.72x</td><td>2.57</td><td>2.07x</td><td>2.92</td></tr>
<tr><td>Hunyuan-7B-Instruct</td><td>2.22x</td><td>3.58</td><td>3.59x</td><td>5.47</td><td>2.96x</td><td>4.68</td><td>1.64x</td><td>2.56</td><td>2.60x</td><td>4.07</td></tr>
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
<tr><td rowspan="3"><strong>Temperature=1</strong></td>
<td>Hunyuan-1.8B-Instruct</td><td>1.58x</td><td>2.36</td><td>2.35x</td><td>3.56</td><td>2.23x</td><td>3.38</td><td>1.26x</td><td>1.87</td><td>1.86x</td><td>2.79</td></tr>
<tr><td>Hunyuan-4B-Instruct</td><td>1.36x</td><td>2.05</td><td>1.97x</td><td>2.86</td><td>1.72x</td><td>2.68</td><td>1.14x</td><td>1.76</td><td>1.55x</td><td>2.34</td></tr>
<tr><td>Hunyuan-7B-Instruct</td><td>1.90x</td><td>3.11</td><td>3.12x</td><td>5.09</td><td>2.74x</td><td>4.34</td><td>1.47x</td><td>2.39</td><td>2.31x</td><td>3.73</td></tr>
</tbody>
</table>
</table>