runtime: morestack on gsignal signal: trace/breakpoint trap due to g0 stack misattribution

In rare cases on linux/amd64 race builds we've seen crashes that look like:

```
fatal: morestack on gsignal

signal: trace/breakpoint trap (core dumped)
```

The root cause is signal delivery on a sigaltstack allocated very close to the g0 stack. When cgo is enabled, `mstart` [estimates the g0 stack bounds](https://cs.opensource.google/go/go/+/master:src/runtime/proc.go;l=1249-1257;drc=3d40895e36e5f16654fa6b75f7fdf59edb18d2e0) ([cgo side](https://cs.opensource.google/go/go/+/master:src/runtime/cgo/gcc_linux_amd64.c;l=72;drc=1d78139128d6d839d7da0aeb10b3e51b6c7c0749)), but this is a rough estimate and the g0 `stack.lo` may actually be beyond the end of the g0 stack.

On signal delivery, `adjustSignalStack` may then [incorrectly determine](https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;l=478;drc=b0b0d9828308368e9fbd59ec5de55801f568f720) that the signal was delivered on the g0 stack . Since the overlap is likely to be very close to g0 `stack.lo`, functions in signal handling have a high probability of "running out of stack space" and calling `morestack`. Boom.

Here's one example of overlap I captured:

Our SP on sigtrampgo entry: 0x7f99841fe328
sigaltstack from sigcontext: [0x7f99841ef000, 0x7f99841ff000)
g0 stack from gp.m.g0.stack: [0x7f99841fded8, 0x7f99849fdad8)

`mstart` contains a fudge factor of 1024 to try to address this inaccuracy, but checking against `pthread_attr_getstack` indicates that the `mstart` SP is actually 9616 bytes below the top of the stack (that may be off by 1 page (4096), I need to double check. Either way > 1024 bytes).

cc @cherrymui @aclements 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runtime: morestack on gsignal signal: trace/breakpoint trap due to g0 stack misattribution #43853

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

runtime: morestack on gsignal signal: trace/breakpoint trap due to g0 stack misattribution #43853

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions