-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: fatal error: acquirep: invalid p state (AMD Opteron 6172 with 48 cores) #10240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please run the program with GOTRACEBACK=2 GODEBUG=scheddetail=1 environment variables and attach output. Or better several outputs. |
Thanks for the detailed info! |
CL https://golang.org/cl/10713 mentions this issue. |
CL https://golang.org/cl/10791 mentions this issue. |
Issues golang#10240, golang#10541, golang#10941, golang#11023, golang#11027 and possibly others are indicating memory corruption in the runtime. One of the easiest places to both get corruption and detect it is in the allocator's free lists since they appear throughout memory and follow strict invariants. This commit adds a check when sweeping a span that its free list is sane and, if not, it prints the corrupted free list and panics. Hopefully this will help us collect more information on these failures. Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64
Stack barriers assume that writes through pointers to frames above the current frame will get write barriers, and hence these frames do not need to be re-scanned to pick up these changes. For normal writes, this is true. However, there are places in the runtime that use typedmemmove to potentially write through pointers to higher frames (such as mapassign1). Currently, typedmemmove does not execute write barriers if the destination is on the stack. If there's a stack barrier between the current frame and the frame being modified with typedmemmove, and the stack barrier is not otherwise hit, it's possible that the garbage collector will never see the updated pointer and incorrectly reclaim the object. Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove and its variants) detect when the destination is in the stack and unwind stack barriers up to the point, forcing mark termination to later rescan the effected frame and collect these pointers. Fixes #11084. Might be related to #10240, #10541, #10941, #11023, #11027 and possibly others. Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd Reviewed-on: https://go-review.googlesource.com/10791 Reviewed-by: Russ Cox <[email protected]>
Issues #10240, #10541, #10941, #11023, #11027 and possibly others are indicating memory corruption in the runtime. One of the easiest places to both get corruption and detect it is in the allocator's free lists since they appear throughout memory and follow strict invariants. This commit adds a check when sweeping a span that its free list is sane and, if not, it prints the corrupted free list and panics. Hopefully this will help us collect more information on these failures. Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64 Reviewed-on: https://go-review.googlesource.com/10713 Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Russ Cox <[email protected]> Run-TryBot: Austin Clements <[email protected]>
Hi @artjomsimon. We've fixed several memory corruption and lost write barrier issues in the runtime over the past few weeks. Please try to reproduce the problem with current master and reopen this issue if it's still happening. Thanks! |
Hi everyone,
while experimenting with the language and trying to port a LU factorization benchmark from the Barcelona OpenMP Tasks Suite written in C, I got fatal error: acquirep: invalid p state on an AMD 48-core machine (Opteron 6172).
I've tried to strip the code to the bare minimum that triggers the error. That's a bit difficult, because the problem is highly undeterministic, and removing code that I presume to be irrelevant for a race condition just never triggers it. This version seems to trigger it quite reliably, in 50%-80% of the runs.
I've compiled it with go build (go 1.4.2) and ran the executable in a for loop in bash:
Now I'm aware that this isn't idiomatic go, at all, it's a quite literal transliteration of the C version, but nevertheless, I guess the go runtime shouldn't crash like this:
for i in {1..15}; do bin/sparselu-crash-mwe -n 201 -m 69; done
Inlining or reducing the genmat() function to trivial cases seems to stop provoking the crash.
I can't reproduce this on an Intel 2-core CPU (i3/i5).
Am I doing something completely wrong here, or is this a legitimate runtime bug? How can I debug this further?
Thank you,
Artjom
The text was updated successfully, but these errors were encountered: