You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In rare cases on linux/amd64 race builds we've seen crashes that look like:
fatal: morestack on gsignal
signal: trace/breakpoint trap (core dumped)
The root cause is signal delivery on a sigaltstack allocated very close to the g0 stack. When cgo is enabled, mstartestimates the g0 stack bounds (cgo side), but this is a rough estimate and the g0 stack.lo may actually be beyond the end of the g0 stack.
On signal delivery, adjustSignalStack may then incorrectly determine that the signal was delivered on the g0 stack . Since the overlap is likely to be very close to g0 stack.lo, functions in signal handling have a high probability of "running out of stack space" and calling morestack. Boom.
Here's one example of overlap I captured:
Our SP on sigtrampgo entry: 0x7f99841fe328
sigaltstack from sigcontext: [0x7f99841ef000, 0x7f99841ff000)
g0 stack from gp.m.g0.stack: [0x7f99841fded8, 0x7f99849fdad8)
mstart contains a fudge factor of 1024 to try to address this inaccuracy, but checking against pthread_attr_getstack indicates that the mstart SP is actually 9616 bytes below the top of the stack (that may be off by 1 page (4096), I need to double check. Either way > 1024 bytes).
There is pthread_attr_getstack which can provide accurate stack bounds. However, I'm not convinced we can use this portably. e.g., glibc's implementation looks like it always succeeds, but NetBSD's appears to be able to return NULL for the stack address.
In rare cases on linux/amd64 race builds we've seen crashes that look like:
The root cause is signal delivery on a sigaltstack allocated very close to the g0 stack. When cgo is enabled,
mstart
estimates the g0 stack bounds (cgo side), but this is a rough estimate and the g0stack.lo
may actually be beyond the end of the g0 stack.On signal delivery,
adjustSignalStack
may then incorrectly determine that the signal was delivered on the g0 stack . Since the overlap is likely to be very close to g0stack.lo
, functions in signal handling have a high probability of "running out of stack space" and callingmorestack
. Boom.Here's one example of overlap I captured:
Our SP on sigtrampgo entry: 0x7f99841fe328
sigaltstack from sigcontext: [0x7f99841ef000, 0x7f99841ff000)
g0 stack from gp.m.g0.stack: [0x7f99841fded8, 0x7f99849fdad8)
mstart
contains a fudge factor of 1024 to try to address this inaccuracy, but checking againstpthread_attr_getstack
indicates that themstart
SP is actually 9616 bytes below the top of the stack (that may be off by 1 page (4096), I need to double check. Either way > 1024 bytes).cc @cherrymui @aclements
The text was updated successfully, but these errors were encountered: