Skip to content

[XPU]Fix w4a8 precision bug && rollback moe algo #4463

Merged
EmmonsCurse merged 4 commits intoPaddlePaddle:developfrom
iosmers:fix_w4a8_bug
Oct 17, 2025
Merged

[XPU]Fix w4a8 precision bug && rollback moe algo #4463
EmmonsCurse merged 4 commits intoPaddlePaddle:developfrom
iosmers:fix_w4a8_bug

Conversation

@iosmers
Copy link
Copy Markdown
Collaborator

@iosmers iosmers commented Oct 17, 2025

1、由于使用新的EP MOE算子导致显存占用增加,导致300B 128K 8卡模型启动过程报OOM的问题,回退MOE算子
2、修复W4A8C8模型的精度问题

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Oct 17, 2025

Thanks for your contribution!

@iosmers iosmers changed the title [XPU]fix w4a8 precision bug [XPU]Fix w4a8 precision bug && Roback MOE algo Oct 17, 2025
@iosmers iosmers changed the title [XPU]Fix w4a8 precision bug && Roback MOE algo [XPU]Fix w4a8 precision bug && rollback MOE algo Oct 17, 2025
@iosmers iosmers changed the title [XPU]Fix w4a8 precision bug && rollback MOE algo [XPU]Fix w4a8 precision bug && rollback moe algo Oct 17, 2025
cache_v_scale = self.cache_quant_config.max_bound / cache_v_scale_tensor
cache_k_out_scale = cache_k_scale_tensor / self.cache_quant_config.max_bound
cache_v_out_scale = cache_v_scale_tensor / self.cache_quant_config.max_bound
cache_k_out_scale = cache_k_scale_tensor
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_k_out_scale 和 cache_k_scale 不是倒数关系吗?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在对称量化里面,实际上block_attn算子的cache_k_out_scale需要max值。之前就是理解错了,导致会有乱码。
/**

  • qkv shape: [token_num, (num_heads + 2 * kv_num_heads) * head_dim]
  • k_scales/v_scales value: 127 / max (type = TS)
  • k_scales_inv/v_scales_inv value:
    1. perchannel with zp: max / 127 (type = TS)
    1. perchannel without zp: max (type = float)
      **/

Copy link
Copy Markdown
Collaborator

@EmmonsCurse EmmonsCurse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@hong19860320 hong19860320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EmmonsCurse EmmonsCurse merged commit a64c040 into PaddlePaddle:develop Oct 17, 2025
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants