Tencent · yghstill · Aug 5, 2025 · Aug 4, 2025 · Aug 4, 2025 · Aug 4, 2025
diff --git a/README.md b/README.md
@@ -62,16 +62,22 @@ Coming soon：
 | [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) |      ✅           |         ✅           |       ✅             | ✅           |       ✅            |
 
 ### 投机采样
-目前已开源Qwen3系列模型的Eagle3权重。
 
-| 模型名     |      Eagle3       |
-| ----------| ----------------- |
-| [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3)    |      ✅           |
-| [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3)        |      ✅           |
-| [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3)        |      ✅           |
-| [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3)      |      ✅           |
-| [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3)      |      ✅           |
-| [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3)  |      ✅           |
+#### Eagle3
+目前已开源Qwen3和Hunyuan系列模型的Eagle3权重。
+
+| Qwen3  Models   | Hunyuan Models     |
+| ----------|----------|
+| ✅ [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3)    |✅ [Hunyuan-1.8B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-1.8B-Instruct_eagle3)    |
+| ✅ [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3)        |✅ [Hunyuan-4B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-4B-Instruct_eagle3)        |
+| ✅ [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3)        |✅ [Hunyuan-7B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-7B-Instruct_eagle3)        |
+| ✅ [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3)      |
+| ✅ [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3)      |
+| ✅ [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3)  |
+
+
+
+
 
 ## 🛎️如何使用
 
@@ -279,6 +285,7 @@ Qwen3系列模型的`BF16`、`FP8-Static`、`FP8-Dynamic`、`INT8-Dynamic`、`IN
 </table>
 
 ### （2）投机采样
+#### Qwen3 Series Models
 Qwen3系列的Eagle3模型在MT-bench/HunmanEval/GSM8K/Alpaca上的加速结果如下：
 
 <table>
@@ -312,6 +319,32 @@ Qwen3系列的Eagle3模型在MT-bench/HunmanEval/GSM8K/Alpaca上的加速结果
   </tbody>
 </table>
 
+Hunyuan系列的Eagle3模型在MT-bench/HunmanEval/GSM8K/Alpaca上的加速结果如下：
+
+<table>
+  <thead>
+    <tr>
+        <th>&nbsp</th><th>&nbsp</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
+    <tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th></tr>
+  </thead>
+  <tbody>
+    <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
+    <tr><td rowspan="3"><strong>T=0</strong></td>
+    <td>Hunyuan-1.8B-Instruct</td><td>1.97x</td><td>2.90</td><td>2.58x</td><td>3.73</td><td>2.61x</td><td>3.71</td><td>1.71x</td><td>2.43</td><td>2.22x</td><td>3.19</td></tr>
+    <tr> <td>Hunyuan-4B-Instruct</td><td>1.77x</td><td>2.60</td><td>2.64x</td><td>3.35</td><td>2.14x</td><td>3.17</td><td>1.72x</td><td>2.57</td><td>2.07x</td><td>2.92</td></tr>
+    <tr><td>Hunyuan-7B-Instruct</td><td>2.22x</td><td>3.58</td><td>3.59x</td><td>5.47</td><td>2.96x</td><td>4.68</td><td>1.64x</td><td>2.56</td><td>2.60x</td><td>4.07</td></tr>
+    <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
+    <tr><td rowspan="3"><strong>T=1</strong></td>
+    <td>Hunyuan-1.8B-Instruct</td><td>1.58x</td><td>2.36</td><td>2.35x</td><td>3.56</td><td>2.23x</td><td>3.38</td><td>1.26x</td><td>1.87</td><td>1.86x</td><td>2.79</td></tr>
+    <tr><td>Hunyuan-4B-Instruct</td><td>1.36x</td><td>2.05</td><td>1.97x</td><td>2.86</td><td>1.72x</td><td>2.68</td><td>1.14x</td><td>1.76</td><td>1.55x</td><td>2.34</td></tr>
+    <tr><td>Hunyuan-7B-Instruct</td><td>1.90x</td><td>3.11</td><td>3.12x</td><td>5.09</td><td>2.74x</td><td>4.34</td><td>1.47x</td><td>2.39</td><td>2.31x</td><td>3.73</td></tr>
+  </tbody>
+</table>
 
 ## 📝许可协议
 本项目的代码依照 [License for AngelSlim](LICENSE) 协议开源。

diff --git a/README_en.md b/README_en.md
@@ -62,16 +62,18 @@ Currently supports the following LLMs, including Hunyuan-Dense, Hunyuan-MoE, Qwe
 | [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8)                                    | ✅           | ✅          | ✅            | ✅         | ✅        |
 
 ### Speculative Decoding
+
+#### Eagle3
 The Eagle3 weights for the Qwen3 series model are now available.
 
-| Model     |      Eagle3       | 
-| ----------| ----------------- | 
-| [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3)    |      ✅           |
-| [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3)        |      ✅           |
-| [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3)        |      ✅           |
-| [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3)      |      ✅           |
-| [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3)      |      ✅           |
-| [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3)  |      ✅           |
+| Qwen3  Models   | Hunyuan Models     |
+| ----------|----------|
+| ✅ [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3)    |✅ [Hunyuan-1.8B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-1.8B-Instruct_eagle3)    |
+| ✅ [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3)        |✅ [Hunyuan-4B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-4B-Instruct_eagle3)        |
+| ✅ [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3)        |✅ [Hunyuan-7B-Instruct](https://huggingface.co/AngelSlim/Hunyuan-7B-Instruct_eagle3)        |
+| ✅ [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3)      |
+| ✅ [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3)      |
+| ✅ [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3)  |
 
 ## 🛎️How to Use
 
@@ -282,6 +284,8 @@ Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`
 </table>
 
 ### (2) Speculative Decoding
+
+#### Qwen3 Series Models
 Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
 
 <table>
@@ -315,6 +319,33 @@ Benchmark results for Qwen3 series models with `Eagle3` speculative decoding alg
   </tbody>
 </table>
 
+#### Hunyuan Series Models
+Benchmark results for Hunyuan series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
+
+<table>
+  <thead>
+    <tr>
+        <th>&nbsp</th><th>&nbsp</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
+    <tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th></tr>
+  </thead>
+  <tbody>
+    <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
+    <tr><td rowspan="3"><strong>T=0</strong></td>
+    <td>Hunyuan-1.8B-Instruct</td><td>1.97x</td><td>2.90</td><td>2.58x</td><td>3.73</td><td>2.61x</td><td>3.71</td><td>1.71x</td><td>2.43</td><td>2.22x</td><td>3.19</td></tr>
+    <tr> <td>Hunyuan-4B-Instruct</td><td>1.77x</td><td>2.60</td><td>2.64x</td><td>3.35</td><td>2.14x</td><td>3.17</td><td>1.72x</td><td>2.57</td><td>2.07x</td><td>2.92</td></tr>
+    <tr><td>Hunyuan-7B-Instruct</td><td>2.22x</td><td>3.58</td><td>3.59x</td><td>5.47</td><td>2.96x</td><td>4.68</td><td>1.64x</td><td>2.56</td><td>2.60x</td><td>4.07</td></tr>
+    <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
+    <tr><td rowspan="3"><strong>T=1</strong></td>
+    <td>Hunyuan-1.8B-Instruct</td><td>1.58x</td><td>2.36</td><td>2.35x</td><td>3.56</td><td>2.23x</td><td>3.38</td><td>1.26x</td><td>1.87</td><td>1.86x</td><td>2.79</td></tr>
+    <tr><td>Hunyuan-4B-Instruct</td><td>1.36x</td><td>2.05</td><td>1.97x</td><td>2.86</td><td>1.72x</td><td>2.68</td><td>1.14x</td><td>1.76</td><td>1.55x</td><td>2.34</td></tr>
+    <tr><td>Hunyuan-7B-Instruct</td><td>1.90x</td><td>3.11</td><td>3.12x</td><td>5.09</td><td>2.74x</td><td>4.34</td><td>1.47x</td><td>2.39</td><td>2.31x</td><td>3.73</td></tr>
+  </tbody>
+</table>
 
 ## 📝 License
 

diff --git a/docs/source/features/speculative_decoding/eagle.md b/docs/source/features/speculative_decoding/eagle.md
@@ -7,6 +7,8 @@
 其中全部数据都是在单张H20上使用pytorch推理获得。
 
 ## 快速测试
+
+### SGLang
 目前sglang已经支持Qwen3-8B/14B/30B-A3B模型的eagle3部署，你可以选择使用sglang作为推理后端快速验证Eagle3模型的加速效果。
 在已经安装sglang的环境中使用以下命令可以快速启动一个兼容Openai的服务，然后即可以通过本地端口进行请求了。
 - 启动兼容OpenAI格式的API服务
@@ -28,5 +30,19 @@
     - `TARGET_MODEL_PATH_OR_NAME`为本地路径或模型在huggingface上的名字;
     - `EAGLE3_MODEL_PATH`为Eagle3模型路径或在huggingface上的名字;
 
+
+### vLLM
+目前vllm已经支持Hunyuan-1.8B-Instruct/4B-Instruct/7B-Instruct模型的eagle3部署，你可以选择使用vllm作为推理后端快速验证Eagle3模型的加速效果。
+在已经安装正确的[vllm commit](https://github.com/vllm-project/vllm/pull/22080) 的环境中使用以下命令可以快速启动一个兼容Openai的服务，然后即可以通过本地端口进行请求了。
+- 启动兼容OpenAI格式的API服务
+
+    ```shell
+    python3 -m vllm.entrypoints.openai.api_server --tensor-parallel-size 1 \
+        --port 8000 \
+        --speculative_config '{"model": "AngelSlim/Hunyuan-1.8B-Instruct_eagle3", "method" : "eagle3", "draft_tensor_parallel_size" : 1,  "num_speculative_tokens": 2}' --trust-remote-code \
+        --model  tencent/Hunyuan-1.8B-Instruct 
+    ```
+但是由于vllm最新版本Eagle3并不支持tree attention, 因此推理验证时为chain-base推理模式。
+
 ## 训练及创新
 Comming soon.
diff --git a/docs/source/performance/speculative_decoding/benchmarks.md b/docs/source/performance/speculative_decoding/benchmarks.md
@@ -2,6 +2,8 @@
 
 ## Eagle3
 
+### Qwen3 Series Models
+
 |                  |              | MT-bench         |            | HumanEval         |             | GSM8K          |         | Alpaca         |          | Mean          |        |
 |------------------|--------------|------------------|------------|-------------------|-------------|----------------|---------|----------------|----------|---------------|--------|
 |                  | Model        |  Speedup         |  τ         |  Speedup          |  τ          |  Speedup       |  τ      |  Speedup       |  τ       |  Speedup      |  τ     |
@@ -19,3 +21,30 @@
 |                  | Qwen3-32B    | 1.62x            | 1.91       | 1.71x             | 2.05        | 1.78x          | 2.10    | 1.80x          | 1.95     | 1.62x         | 2.00   |
 |                  | Qwen3-30B-A3B| 1.91x            | 2.46       | 2.00x             | 2.64        | 1.90x          | 2.53    | 1.80x          | 2.32     | 1.90x         | 2.48   |
 
+### Hunyuan Series Models
+
+<table>
+  <thead>
+    <tr>
+        <th>&nbsp</th><th>&nbsp</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
+        <th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
+    <tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th></tr>
+  </thead>
+  <tbody>
+    <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
+    <tr><td rowspan="3"><strong>Temperature=0</strong></td>
+    <td>Hunyuan-1.8B-Instruct</td><td>1.97x</td><td>2.90</td><td>2.58x</td><td>3.73</td><td>2.61x</td><td>3.71</td><td>1.71x</td><td>2.43</td><td>2.22x</td><td>3.19</td></tr>
+    <tr> <td>Hunyuan-4B-Instruct</td><td>1.77x</td><td>2.60</td><td>2.64x</td><td>3.35</td><td>2.14x</td><td>3.17</td><td>1.72x</td><td>2.57</td><td>2.07x</td><td>2.92</td></tr>
+    <tr><td>Hunyuan-7B-Instruct</td><td>2.22x</td><td>3.58</td><td>3.59x</td><td>5.47</td><td>2.96x</td><td>4.68</td><td>1.64x</td><td>2.56</td><td>2.60x</td><td>4.07</td></tr>
+    <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
+    <tr><td rowspan="3"><strong>Temperature=1</strong></td>
+    <td>Hunyuan-1.8B-Instruct</td><td>1.58x</td><td>2.36</td><td>2.35x</td><td>3.56</td><td>2.23x</td><td>3.38</td><td>1.26x</td><td>1.87</td><td>1.86x</td><td>2.79</td></tr>
+    <tr><td>Hunyuan-4B-Instruct</td><td>1.36x</td><td>2.05</td><td>1.97x</td><td>2.86</td><td>1.72x</td><td>2.68</td><td>1.14x</td><td>1.76</td><td>1.55x</td><td>2.34</td></tr>
+    <tr><td>Hunyuan-7B-Instruct</td><td>1.90x</td><td>3.11</td><td>3.12x</td><td>5.09</td><td>2.74x</td><td>4.34</td><td>1.47x</td><td>2.39</td><td>2.31x</td><td>3.73</td></tr>
+  </tbody>
+</table>
+</table>