[XPU]Fix w4a8 precision bug && rollback moe algo by iosmers · Pull Request #4463 · PaddlePaddle/FastDeploy

iosmers · 2025-10-17T01:09:33Z

1、由于使用新的EP MOE算子导致显存占用增加，导致300B 128K 8卡模型启动过程报OOM的问题，回退MOE算子
2、修复W4A8C8模型的精度问题

paddle-bot · 2025-10-17T01:09:38Z

Thanks for your contribution!

hong19860320 · 2025-10-17T06:30:34Z

fastdeploy/model_executor/layers/backends/xpu/quantization/kv_cache.py

            cache_v_scale = self.cache_quant_config.max_bound / cache_v_scale_tensor
-            cache_k_out_scale = cache_k_scale_tensor / self.cache_quant_config.max_bound
-            cache_v_out_scale = cache_v_scale_tensor / self.cache_quant_config.max_bound
+            cache_k_out_scale = cache_k_scale_tensor


cache_k_out_scale 和 cache_k_scale 不是倒数关系吗？

在对称量化里面，实际上block_attn算子的cache_k_out_scale需要max值。之前就是理解错了，导致会有乱码。
/**

qkv shape: [token_num, (num_heads + 2 * kv_num_heads) * head_dim]

k_scales/v_scales value: 127 / max (type = TS)

k_scales_inv/v_scales_inv value:

perchannel with zp: max / 127 (type = TS)

perchannel without zp: max (type = float)
**/

EmmonsCurse

LGTM

hong19860320

LGTM

fix w4a8 precision bug

e60a2e4

iosmers added 3 commits October 17, 2025 03:28

add env

217ff90

Merge branch 'develop' into fix_w4a8_bug

2d8b319

code stype check

5dc1ef9

iosmers changed the title ~~[XPU]fix w4a8 precision bug~~ [XPU]Fix w4a8 precision bug && Roback MOE algo Oct 17, 2025

iosmers changed the title ~~[XPU]Fix w4a8 precision bug && Roback MOE algo~~ [XPU]Fix w4a8 precision bug && rollback MOE algo Oct 17, 2025

iosmers changed the title ~~[XPU]Fix w4a8 precision bug && rollback MOE algo~~ [XPU]Fix w4a8 precision bug && rollback moe algo Oct 17, 2025

hong19860320 reviewed Oct 17, 2025

View reviewed changes

EmmonsCurse approved these changes Oct 17, 2025

View reviewed changes

hong19860320 approved these changes Oct 17, 2025

View reviewed changes

EmmonsCurse merged commit a64c040 into PaddlePaddle:develop Oct 17, 2025
13 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU]Fix w4a8 precision bug && rollback moe algo #4463

[XPU]Fix w4a8 precision bug && rollback moe algo #4463
EmmonsCurse merged 4 commits intoPaddlePaddle:developfrom
iosmers:fix_w4a8_bug

iosmers commented Oct 17, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Oct 17, 2025

Uh oh!

hong19860320 Oct 17, 2025

Uh oh!

iosmers Oct 17, 2025

Uh oh!

EmmonsCurse left a comment

Uh oh!

hong19860320 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iosmers commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Oct 17, 2025

Uh oh!

hong19860320 Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

iosmers Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

EmmonsCurse left a comment

Choose a reason for hiding this comment

Uh oh!

hong19860320 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iosmers commented Oct 17, 2025 •

edited

Loading