Skip to content

Conversation

A-nnonymous
Copy link
Contributor

@A-nnonymous A-nnonymous commented Aug 18, 2025

PR Category

Operator Mechanism

PR Types

Performance

Description

MTP related operator enhance & implement
This PR includes:

  • Implementation of embedd_grad_add_to kernel to perform inplace aggregation of main_grad with bfloat16 out_grad, saving elementwise cast, add and memset's time cost(~2.8 ms in DSV3) and global memory.
  • Specialization of cross_entropy_w_softmax forward op in bfloat16 logit circumstances, integrate the type-promotion into kernel, reduce IO and saving cast timecost(~1.5 ms total)
  • Implementation of cross_entropy_with_softmax_bwd_w_downcast kernel, integrate downcast inside kernel to prevent elementwise kernel cost, further optimized with vectorization (saving ~3.2 ms total)

pcard-91067

Copy link

paddle-bot bot commented Aug 18, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@codecov-commenter
Copy link

codecov-commenter commented Aug 19, 2025

Codecov Report

❌ Patch coverage is 43.90244% with 23 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@d7133ee). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddle/phi/infermeta/ternary.cc 0.00% 15 Missing ⚠️
...ional/cross_entropy_with_softmax_bwd_w_downcast.py 66.66% 3 Missing ⚠️
...le/incubate/nn/functional/embedding_grad_add_to.py 66.66% 3 Missing ⚠️
paddle/phi/infermeta/binary.cc 66.66% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (43.90%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #74684   +/-   ##
==========================================
  Coverage           ?   43.90%           
==========================================
  Files              ?        5           
  Lines              ?       41           
  Branches           ?        0           
==========================================
  Hits               ?       18           
  Misses             ?       23           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

3 similar comments
@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@phlrain phlrain self-requested a review August 20, 2025 11:47
@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请根据自定义算子规范,补充框架基础API组合实现版本。

Comment on lines +1713 to +1723
- op : embedding_grad_add_to
args : (Tensor token_indices, Tensor main_grad_, Tensor out_grad)
output : Tensor(main_grad_out)
infer_meta :
func : UnchangedInferMeta
param : [main_grad_]
kernel :
func : embedding_grad_add_to
param : [token_indices, main_grad_, out_grad]
data_type : main_grad_
inplace : (main_grad_ -> main_grad_out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些放到fused_ops.yaml中吧,看起来不是标准算子

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的收到,我今天提交一个修复PR,同步进行CI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#74811 这是后续的修复PR,正在工作

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@A-nnonymous
Copy link
Contributor Author

/re-run approval

@A-nnonymous
Copy link
Contributor Author

/re-run Static-Check

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@zhangbo9674 zhangbo9674 merged commit 04c0f50 into PaddlePaddle:develop Aug 22, 2025
175 of 203 checks passed
Luckycheng222 pushed a commit to Luckycheng222/Paddle that referenced this pull request Aug 25, 2025
* stash

* Added embedd_grad_add_to kernel

* fix openblas git

* fix banner

* Specialized cross_entropy_w_softmax in bfloat16 logit circumstances

* Fix bugs

* Add cross_entropy_with_softmax_bwd_w_downcast

* Finish optest

* fix miscs

* Optimized kernel performance

* fix miscs

* bypass optest in some invalid enviroments.

* Fix corner case

* forbid dcu bf16 dtype.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants