-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: crash with signal 0xb near runtime.sweepone #11027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CC @RLH @aclements It would be very helpful to have a reproducible test case. |
If it's reliable (or fairly reliable), but you can't reduce it to a test case you can share, it would be helpful if you could bisect it to a bad commit. Another potentially useful thing to do would be to run with GODEBUG=efence=1, though that perturbs the memory layout enough that the crash conditions may not occur. |
I've only seen this particular crash once in the past few days of running (#11023 is much more common), so I don't think I'll be able to bisect effectively. I tried running with GODEBUG=efence=1, but my program crashes with "fatal error: out of memory (stackalloc)" every minute or so and I haven't seen it crash with any other errors of interest (it crashed once with #11023). Is the memory exhaustion expected even on amd64? I'll see what I can do on a reduced test case. |
That's a little surprising, but efence does change memory allocation patterns fairly substantially.
Thanks! |
For reference, we've seen this twice on the build dashboard: 2015-05-07T21:08:29-9626561/nacl-amd64p32 I've also seen it once on a trybot: https://storage.googleapis.com/go-build-log/98c65eab/freebsd-amd64-gce101_4e0f3a2f.log (https://go-review.googlesource.com/#/c/10481) |
CL https://golang.org/cl/10714 mentions this issue. |
CL https://golang.org/cl/10713 mentions this issue. |
CL https://golang.org/cl/10791 mentions this issue. |
Issues golang#10240, golang#10541, golang#10941, golang#11023, golang#11027 and possibly others are indicating memory corruption in the runtime. One of the easiest places to both get corruption and detect it is in the allocator's free lists since they appear throughout memory and follow strict invariants. This commit adds a check when sweeping a span that its free list is sane and, if not, it prints the corrupted free list and panics. Hopefully this will help us collect more information on these failures. Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64
Stack barriers assume that writes through pointers to frames above the current frame will get write barriers, and hence these frames do not need to be re-scanned to pick up these changes. For normal writes, this is true. However, there are places in the runtime that use typedmemmove to potentially write through pointers to higher frames (such as mapassign1). Currently, typedmemmove does not execute write barriers if the destination is on the stack. If there's a stack barrier between the current frame and the frame being modified with typedmemmove, and the stack barrier is not otherwise hit, it's possible that the garbage collector will never see the updated pointer and incorrectly reclaim the object. Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove and its variants) detect when the destination is in the stack and unwind stack barriers up to the point, forcing mark termination to later rescan the effected frame and collect these pointers. Fixes #11084. Might be related to #10240, #10541, #10941, #11023, #11027 and possibly others. Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd Reviewed-on: https://go-review.googlesource.com/10791 Reviewed-by: Russ Cox <[email protected]>
Issues #10240, #10541, #10941, #11023, #11027 and possibly others are indicating memory corruption in the runtime. One of the easiest places to both get corruption and detect it is in the allocator's free lists since they appear throughout memory and follow strict invariants. This commit adds a check when sweeping a span that its free list is sane and, if not, it prints the corrupted free list and panics. Hopefully this will help us collect more information on these failures. Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64 Reviewed-on: https://go-review.googlesource.com/10713 Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Russ Cox <[email protected]> Run-TryBot: Austin Clements <[email protected]>
Currently, when shrinkstack computes whether the halved stack allocation will have enough room for the stack, it accounts for the stack space that's actively in use but fails to leave extra room for the stack guard space. As a result, *if* the minimum stack size is small enough or the guard large enough, it may shrink the stack and leave less than enough room to run nosplit functions. If the next function called after the stack shrink is a nosplit function, it may overflow the stack without noticing and overwrite non-stack memory. We don't think this is happening under normal conditions right now. The minimum stack allocation is 2K and the guard is 640 bytes. The "worst case" stack shrink is from 4K (4048 bytes after stack barrier array reservation) to 2K (2016 bytes after stack barrier array reservation), which means the largest "used" size that will qualify for shrinking is 4048/4 - 8 = 1004 bytes. After copying, that leaves 2016 - 1004 = 1012 bytes of available stack, which is significantly more than the guard space. If we were to reduce the minimum stack size to 1K or raise the guard space above 1012 bytes, the logic in shrinkstack would no longer leave enough space. It's also possible to trigger this problem by setting firstStackBarrierOffset to 0, which puts stack barriers in a debug mode that steals away *half* of the stack for the stack barrier array reservation. Then, the largest "used" size that qualifies for shrinking is (4096/2)/4 - 8 = 504 bytes. After copying, that leaves (2096/2) - 504 = 8 bytes of available stack; much less than the required guard space. This causes failures like those in issue #11027 because func gc() shrinks its own stack and then immediately calls casgstatus (a nosplit function), which overflows the stack and overwrites a free list pointer in the neighboring span. However, since this seems to require the special debug mode, we don't think it's responsible for issue #11027. To forestall all of these subtle issues, this commit modifies shrinkstack to correctly account for the guard space when considering whether to halve the stack allocation. Change-Id: I7312584addc63b5bfe55cc384a1012f6181f1b9d Reviewed-on: https://go-review.googlesource.com/10714 Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Russ Cox <[email protected]>
Hi @aclements - I ran my app with 8fa1a69 (from 17 Jun 2015) for a few days and did not observe any crashes. It looks like this is resolved. Thanks! |
I have a process that receives data over a few hundred concurrent TCP connections and writes them to files. It's been crashing on recent versions of tip (it was stable on 1.4.1).
The text was updated successfully, but these errors were encountered: