[Accuracy diff No.43-44] Accuracy grid sample #74555

zhengshengning · 2025-08-12T03:31:09Z

PR Category

Execute Infrastructure

PR Types

Improvements

Description

精度问题描述：

选择 mode="nearest" 或 mode="bilinear" 下都存在正向和反向精度问题。

问题定位与分析：

对比paddle与torch的实现发现，在mode="bilinear"时，torch实现使用 std::floor() 计算坐标位置，而paddle cpu使用floor()后又额外增加round()操作，所以需要去掉第二次的round()操作。
在mode="nearest"时，paddle gpu实现使用std::nearbyint()计算坐标位置，而paddle cpu使用的round()，所以对齐使用std::nearbyint()。同时代码先后进行了两次round()，需要去掉重复的一次。同时修改gpu GridSample3DCudaKernel中的round()为nearbyint()。
边界判断函数IsInBound()应该输入整型数据进行判断，而不是浮点数，所以需要在调用时使用static_cast。
由于"bilinear"和 "nearest" 共用 GetGridPointValue函数，但是两者的位置计算使用的取整函数不一样，所以增加一个GetGridPointValue_nearest 和 Get3DGridPointValue_nearest函数。

【4D精度测试】

【5D精度测试】

【float32精度问题 & 实验分析 -> 合理】
上面精度修复已经对grid sample的计算流程进行修复，但是在测试过程发现，当使用 paddle.nn.functional.grid_sample(Tensor([100, 1, 176, 176],"float32"), Tensor([100, 1, 37632, 2],"float32"), align_corners=False, ) shape，并且 grid tensor元素为 -0.2670455 时，cpu 与 torch 反向精度会出现不对齐。最后发现时float32浮点数计算精度舍入导致的，符合理论预期。下面是实验分析过程：

计算误差代码：

auto factor = static_cast<T>((max_val + 1) * 0.5);
grid_slice_t.device(place) = (grid_slice_t + static_cast<T>(1)) * factor - static_cast<T>(0.5);

当 grid_slice_t = -0.2670455 ，max_val = 175时，上面代码的使用理论计算应该为63.999996，但是上面数据类型是float32时，计算结果为64.000000，代码后面使用floor()进行位置计算，torch在当前数据下ix = 63，而paddle cpu的ix = 64，从而出现了精度误差。这个时候大家都会想到提升精度到double计算，然后再调用floor()计算位置，这个方法有尝试过，实验结果显示，使用double后，上面 grid_slice_t = -0.2670455 时是对齐了。但是 grid_slice_t = -0.41477278时，double的计算结果为50.9999954，floor()后为50，而torch的计算结果为51.000000，floor()后为51，还是不能对齐。怀疑是float32浮点数运算精度舍入导致。

为了进一步验证计算流程的正确性，将上面出现精度误差的实验数据的类型全部转换成float64，paddle与torch结果完全对齐。

x_np = numpy.load('/host_home/ningzhengsheng/src/PaddleAPITest/x.npy').astype('float64')
grid_np = numpy.load('/host_home/ningzhengsheng/src/PaddleAPITest/grid.npy').astype('float64')

使用float64后的反向计算结果：

// paddle
20.981019201441335 21.869245293486983 12.764171000923856 -73.98225032614089 15.466676140834522 95.2787161154327

// torch
20.981019201441335 21.869245293486983 12.764171000923856 -73.98225032614089 15.466676140834522 95.2787161154327

pcard-67164

… accuracy_grid_sample

paddle-bot · 2025-08-12T03:31:15Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

codecov-commenter · 2025-08-12T06:04:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@6bd5e66). Learn more about missing BASE report.

Additional details and impacted files

@@             Coverage Diff             @@
##             develop    #74555   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         3           
  Lines              ?       105           
  Branches           ?         0           
===========================================
  Hits               ?       105           
  Misses             ?         0           
  Partials           ?         0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… accuracy_grid_sample

wanghuancoder

LGTM

zhengshengning · 2025-08-15T11:07:30Z

/re-run all-failed

zhengshengning added 2 commits August 12, 2025 10:29

fix accuracy for grid_sample

133521e

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

1c110a0

… accuracy_grid_sample

zhengshengning changed the title ~~[Accuracy diff No.43、44] Accuracy grid sample~~ [PHI] Accuracy grid sample Aug 12, 2025

zhengshengning added 2 commits August 14, 2025 14:01

fix grid_sample accuracy

3261a0f

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e9f2cdf

… accuracy_grid_sample

zhengshengning changed the title ~~[PHI] Accuracy grid sample~~ [Accuracy diff No.43-44] Accuracy grid sample Aug 14, 2025

zhengshengning added 2 commits August 15, 2025 12:50

fix grid_sample test

50d09f0

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7ebd4cf

… accuracy_grid_sample

wanghuancoder approved these changes Aug 15, 2025

View reviewed changes

luotao1 approved these changes Aug 15, 2025

View reviewed changes

luotao1 mentioned this pull request Aug 15, 2025

【开源任务】Paddle CPU/GPU Kernel 精度问题推全 #72667

Open

DanielSun11 merged commit a593bf2 into PaddlePaddle:develop Aug 15, 2025
71 of 77 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Accuracy diff No.43-44] Accuracy grid sample #74555

[Accuracy diff No.43-44] Accuracy grid sample #74555

Uh oh!

zhengshengning commented Aug 12, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 12, 2025

Uh oh!

codecov-commenter commented Aug 12, 2025 •

edited

Loading

Uh oh!

wanghuancoder left a comment

Uh oh!

zhengshengning commented Aug 15, 2025

Uh oh!

Uh oh!

Uh oh!

[Accuracy diff No.43-44] Accuracy grid sample #74555

[Accuracy diff No.43-44] Accuracy grid sample #74555

Uh oh!

Conversation

zhengshengning commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Aug 12, 2025

Uh oh!

codecov-commenter commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wanghuancoder left a comment

Choose a reason for hiding this comment

Uh oh!

zhengshengning commented Aug 15, 2025

Uh oh!

Uh oh!

Uh oh!

zhengshengning commented Aug 12, 2025 •

edited

Loading

codecov-commenter commented Aug 12, 2025 •

edited

Loading