-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime/trace: frame pointer unwinding crash on arm64 during async preemption #63830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That generated epilogue looks wrong to me even beyond this issue.
That second instruction should be This is for functions that are leaves but have a nonzero frame size. I think the other variants (nonleaf, and leaf+zeroframe) are correct. The nonleaf variant restores Here's a simple function that has the problem:
|
Change https://go.dev/cl/538635 mentions this issue: |
Actually, maybe that old code is correct, if misordered. The old |
Yes, I think that's correct @randall77. Fixing the ordering would be great! Aside: In non-leaf functions I'm seeing this prologue which gets the ordering right:
Arguably this could be used here as well? I'm saying arguably because we don't need to restore R30 for leaf functions. But if this instruction isn't more expensive than just restoring R29, perhaps having more consistent codgen would be a good thing? |
Is it possible to get this back ported? We have a few places where we need to build with go1.21 while it's still being supported. I manually tested that the patch cleanly applies against go1.21.6. But I don't know how to send a back port CL. Thanks 🙇 |
@gopherbot Please backport to 1.21. This causes crashes with tracing enabled on arm64. Though the FP code has been broken longer, only in 1.21 did tracing start to depend on the frame pointer. |
Backport issue(s) opened: #65449 (for 1.21). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Change https://go.dev/cl/560735 mentions this issue: |
…tore in epilogue For leaf but nonzero-frame functions. Currently we're not restoring it properly. We also need to restore it before popping the stack frame, so that the frame won't get clobbered by a signal handler in the meantime. For #63830 Fixes #65449 Needs a test, but I'm not at all sure how we would actually do that. Leaving for inspiration. Change-Id: I273a25f2a838f05a959c810145cccc5428eaf164 Reviewed-on: https://go-review.googlesource.com/c/go/+/538635 Reviewed-by: Cherry Mui <[email protected]> Reviewed-by: Eric Fang <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: David Chase <[email protected]> (cherry picked from commit c9888bd) Reviewed-on: https://go-review.googlesource.com/c/go/+/560735 TryBot-Bypass: Michael Knyszek <[email protected]> Reviewed-by: Keith Randall <[email protected]>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Used the runtime execution tracer with frame pointer unwinding enabled.
What did you expect to see?
No crashes.
What did you see instead?
Crashes when recording an event for async preemption. Example:
This happens infrequently (observed ~20 times over the course of a week, across a significant number of processes). I believe I have identified the cause. Essentially:
Code is generated for a method like golang.org/x/net/http2.(FrameHeader).Header (example generated code here) which does this:
If the async preemption signal is delivered after 4, but before 5 (a single instruction) then the async preemption signal handler will allocate a new stack frame, overwriting where the interrupted function's stack frame was. It will save all of the registers into that frame. But X29 will point somewhere in the old, overwritten frame, meaning it will point to an address containing the random contents of a register saved on the stack. In particular, it'll point to where the floating point registers are saved (ref).
I can reproduce this with the test program here: https://gist.github.com/nsrip-dd/d85ff0d05d2afa6ca0c12796e992ea91
I'm not sure what the right fix here is. Make sure the frame pointer is restored before incrementing the stack pointer at the end of the function, perhaps?
The text was updated successfully, but these errors were encountered: