Skip to content

Conversation

zty-king
Copy link
Contributor

@zty-king zty-king commented Jun 19, 2025

PR Category

Auto Parallel

PR Types

Improvements

Description

  • 当前问题

​ use_flash_attention设置为false时,此时跑llama2_13b_hybrid_pp会报错,主要原因在于forward输出到下一个stage的output数,和backward时从下一个stage接收到的gard数不相等,导致backward无法正确计算。

image

  • 问题分析

​ 由于在EmbeddingLayer中计算出来的参数需要在forward过程中不断传递,并在每个DecoderLayer中做相关计算,但是注意,除了hidden_states,其它参数在EmbeddingLayer计算之后,只在层间传递,辅助计算。

image

  • 解决思路
    在每层获取输出时,对其过滤,对于stop_grad为True的参数,进行过滤,并适配相关代码,只在层间传递,不做backward

Copy link

paddle-bot bot commented Jun 19, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Jun 19, 2025
# We assume we always send to stage + 1
if not self.is_last:
self.act_send_info[idx] = [self.stage_index + 1]
if not outputs_meta[idx].stop_gradient:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorMeta 没有这个属性

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

paddle-ci-bot bot commented Jun 30, 2025

Sorry to inform you that ea80cdb's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@b5d6a16). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             develop    #73459   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         2           
  Lines              ?        12           
  Branches           ?         0           
===========================================
  Hits               ?        12           
  Misses             ?         0           
  Partials           ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -710,9 +709,19 @@ def forward_one_chunk(
flat_args = _flatten_args(input_args)
flat_kwargs = _flatten_args(composite_kwargs)
flatten_input_tensors = flat_args + flat_kwargs
grad_required_output_tuple = tuple(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

命名似有问题,grad开头,给人误解为是grad数据。命名为 requires_grad_output_tuple更好

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for out in output_tuple
if isinstance(out, paddle.Tensor) and not out.stop_gradient
)
grad_required_flatten_input_tensors = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zty-king
Copy link
Contributor Author

zty-king commented Aug 3, 2025

/re-run all-failed

@zty-king zty-king changed the title 增强stage对层间传递stop_grad=True的参数的支持 [Auto-parallel] 增强stage对层间传递stop_grad=True的参数的支持 Aug 3, 2025
Copy link
Contributor

@xuxinyi389 xuxinyi389 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xuxinyi389 xuxinyi389 merged commit 388eaa7 into PaddlePaddle:develop Aug 4, 2025
82 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants