Skip to content

refine eager backward #72816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

wanghuancoder
Copy link
Contributor

@wanghuancoder wanghuancoder commented May 20, 2025

PR Category

Execute Infrastructure

PR Types

Improvements

Description

Paddle 对于如下代码会报错:

import paddle

a = paddle.ones([100])
a.stop_gradient = False
b = paddle.ones([100])
b.stop_gradient = False

c = a + b
d = c + c
e = d + d
# paddle_out_grads = paddle.autograd.backward([e], [paddle.ones([100])]) # rigth
paddle_out_grads = paddle.autograd.backward([e,c,d], [paddle.ones([100]),paddle.ones([100]),paddle.ones([100])])

报错内容为:

Traceback (most recent call last):
  File "/host_home/wanghuan29/Paddle/test_grad.py", line 13, in <module>
    paddle_out_grads = paddle.autograd.backward([e,c,d], [paddle.ones([100]),paddle.ones([100]),paddle.ones([100])])
  File "/usr/local/lib/python3.9/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/host_home/wanghuan29/Paddle2/build/python/paddle/base/wrapped_decorator.py", line 40, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/host_home/wanghuan29/Paddle2/build/python/paddle/base/framework.py", line 726, in __impl__
    return func(*args, **kwargs)
  File "/host_home/wanghuan29/Paddle2/build/python/paddle/autograd/backward_mode.py", line 140, in backward
    core.eager.run_backward(tensors, grad_tensors, retain_graph)
SystemError: (Fatal) Unable to find next node in the GradTensorHolder 
Trying to run Node without configuring its GradTensorHolder.
  [Hint: Expected node_input_buffer_iter != node_input_buffers_dict.end(), but received node_input_buffer_iter == node_input_buffers_dict.end().] (at /host_home/wanghuan29/Paddle2/paddle/fluid/eager/backward.cc:272)

如下代码有同样的问题:

import paddle

a = paddle.ones([100])
a.stop_gradient = False
b = paddle.ones([100])
b.stop_gradient = False

c = a + b
d = c + c
e = d + d
f, g = paddle.split(e, num_or_sections=2, axis=0)
h = f + f

paddle_out_grads = paddle.autograd.backward([h,g], [paddle.ones([50]),paddle.ones([50])])

第一种情况,用户提供了c和d的梯度,那么通过反向回传获得的梯度该怎么办。实测torch.autograd.grad和paddle.grad都是将用户提供的梯度以及回传算出的梯度加和。因此,paddle.autograd.backward也有必要这么做。
第二种情况,是paddle.autograd.backward存在bug。

这两种错误在paddle.autograd.backward时会发生,在paddle.grad时不会发生,因为paddle.grad会走PreparedForGeneralGrad,对反向图做剪枝等优化。

Pcard-67164

Copy link

paddle-bot bot commented May 20, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (develop@9fd40f9). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             develop    #72816   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         1           
  Lines              ?         5           
  Branches           ?         0           
===========================================
  Hits               ?         5           
  Misses             ?         0           
  Partials           ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@JiabinYang JiabinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tianshuo78520a tianshuo78520a merged commit fbaad0c into PaddlePaddle:develop May 28, 2025
137 of 147 checks passed
LiYuRio pushed a commit to LiYuRio/Paddle that referenced this pull request May 29, 2025
* refine eager backward

* refine
LiYuRio added a commit that referenced this pull request Jun 3, 2025
* refine eager backward

* refine

Co-authored-by: wanghuancoder <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants