-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: ppc64le broken by encourage inlining of functions with single-call bodies #28679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Odd.
The backtrace starts at a GC helper thread, then somehow jumps to the user's thread, at a nonsensical location. Here's goroutine 2's stack trace from a correct run (on linux):
The error isn't deterministic; sometimes it happens and sometimes it doesn't. |
I haven't been able to reproduce this error on any of our systems, using various distros. I tried the full all.bash or just by doing this: |
Our ppc64le builders run Debian jessie with kernel:
|
Check gdb versions as well, that is likely to be part of the issue. |
|
I ran it on a comparable system but with a more recent kernel: gdb --version I don't understand the point of setting GOMAXPROCS=2 and then -test.cpu=1,2,4? Setting GOMAXPROCS=2 but -test.cpu=4 seems wrong. Could something be getting messed up because of that? What is the ppc64 system? Seems strange that is not failing in this way. |
Could we get a newer distro and kernel on the ppc64le builder. This is the second time we've seen failures in the builder that don't happen on other distros. Debian 8 seems pretty old. I don't work with OSU much so I don't know who your contact is for this system or how to make this request. |
Sorry, but that won't happen anytime soon. We have a backlog of builder work and this wouldn't be near the top of the priority list. I recommend just skipping GDB tests if the GDB or kernel is too old. /cc @dmitshur |
It would be good if we could upgrade at some point. It's not a certainty that's the problem in this case, just a guess since it seems to work right so many places. I built the programs for main.go, the program being run by gdb when it fails, before and after the bad commit. One thing I saw that looks suspicious is that starting in the commit where it started failing, Fprintln is now inlined from Println, so main doesn't directly call Println anymore. And fmt.Println is what shows up in the backtrace in the output when it fails. So I'm wondering if that is significant? I don't know what has to happen to the dwarf or whatever information gdb has to look at to figure out the backtrace in the case where inlining happened to get it right, but maybe there is something that causes gdb to look at uninitalized data causing random behavior? I also see that Println and Fprintln have variable argument lists, not common. |
I'm not sure what's happening on the ppc64le builder, but I can reproduce the exact same error output on any machine by doing this: GOMAXPROCS=1 ./runtime.test -test.run=GdbPython$ I can also use gdb with the source for the program being run in the testcase and step through and see what's wrong. I don't think the problem is related to the compiler doing something wrong but because the testcase is different and behaves differently. I can try this test with an older compiler and get the same failure. The problem occurs if forcegchelper is at the top of the stack for one of the goroutines and the goroutine's pc is at the start of the function. At this point the value for sp has not been set yet for forcegchelper, and since this is the initial function for a goroutine I'm not sure what sp would be at that point.
So when we have this situation, and 'goroutine 2 bt' is invoked for a goroutine with this at the top of its stack, things go wrong after that, probably because the sp does not make sense at this point. If I break at the end and do goroutine 2 bt no errors occur. |
Change https://golang.org/cl/152540 mentions this issue: |
To see the reason for the above change, here is a reference in the gdb documentation. https://sourceware.org/gdb/onlinedocs/gdb/Registers.html. This is a way of removing one word from the stack, on machines where stacks grow downward in memory (most machines, nowadays). This assumes that the innermost stack frame is selected; setting $sp is not allowed when other stack frames are selected. |
@laboger
and gdb frame command returns that main.main frame is current frame.
Your fix helps, thank you. |
After a recent change to runtime-gdb_test.go the ppc64le builder has had intermittent failures. The failures occur when trying to invoke the goroutineCmd function to display the backtrace for a selected goroutine. There is nothing wrong with the testcase but it seems to intermittently leave goroutines in a state where an error can occur. The error message indicates that the problem occurs when trying to change the sp back to the original after displaying the stacktrace for the goroutine. gdb.error: Attempt to assign to an unmodifiable value. After some searching I found that this error message can happen if the sp register is changed when on a frame that is not the top-most frame. To fix the problem, frame 0 is selected before changing the value of sp. This fixes the problem in my reproducer environment, and hopefully will fix the problem on the builder. Updates #28679 Change-Id: I329bc95b30f8c95acfb161b0d9cfdcbd917a1954 Reviewed-on: https://go-review.googlesource.com/c/152540 Run-TryBot: Lynn Boger <[email protected]> Reviewed-by: Austin Clements <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
This is still failing freebsd-arm-paulzhol builds. Is it feasible to disable this test, as @bradfitz suggested earlier? Or is there something else that can be tried? |
Change https://golang.org/cl/155932 mentions this issue: |
@katiehockman, done, in https://golang.org/cl/155932 and tracked in new bug #29508 |
Updates #29508 Updates #28679 Change-Id: I19bc9f88aeb2b1f3e69856173a00c5a4d5ed3613 Reviewed-on: https://go-review.googlesource.com/c/155932 Run-TryBot: Brad Fitzpatrick <[email protected]> Reviewed-by: Katie Hockman <[email protected]>
@laboger, is this still a problem or can this be closed? |
I haven't seen this problem in a while on ppc64x. I think it was left open due to similar issues on other platforms. I think it can be closed. |
Thanks. Closing. |
https://go-review.googlesource.com/c/go/+/147361 broke ppc64le.
https://build.golang.org/log/cee992f2c28faffaa1c9526349f4923f9cb16c83
The text was updated successfully, but these errors were encountered: