Skip to content

Cannot longjmp from signal handler to recover from SIG{SEGV,ILL,...} on macOS #44501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alexcrichton opened this issue Feb 22, 2021 · 10 comments
Closed

Comments

@alexcrichton
Copy link

alexcrichton commented Feb 22, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16 darwin/amd64

Does this issue reproduce with the latest release?

Yes (downloaded 1.16 from the weebsite)

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/acrichton/Library/Caches/go-build"
GOENV="/Users/acrichton/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/acrichton/code/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/acrichton/code/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/acrichton/code/gorepro/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/b0/wd3mrtcj36l61jkqrzpdjds00000gn/T/go-build2517283257=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Given these files:

`go.mod`
module repro

go 1.16
`main.go`
package main

// #include "lib.h"
import "C"
import "fmt"

func main() {
        C.setup_signal_handler()
        for i := 0; ; i++ {
                if i%1000 == 0 {
                        fmt.Printf("%d\n", i)
                }
                C.do_trap()
        }
}
`lib.h`
#ifndef __lib_h
void setup_signal_handler(void);
void do_trap(void);
#endif
`lib.c`
#include <assert.h>
#include <setjmp.h>
#include <signal.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

#if 0
#define platform_jmp_buf sigjmp_buf
#define platform_setjmp(a) sigsetjmp(a, 0)
#define platform_longjmp(a, v) siglongjmp(a, v)
#else
#define platform_jmp_buf jmp_buf
#define platform_setjmp(a) setjmp(a)
#define platform_longjmp(a, v) longjmp(a, v)
#endif

struct sigaction PREV;
static int MY_TRAP = 0;
static platform_jmp_buf JMP;

static void signal_handler(int signal, siginfo_t *info, void *data) {
  if (MY_TRAP) {
    MY_TRAP = 0;
    platform_longjmp(JMP, 1);
    assert(0);
  }

  // If we don't handle this delegate to the previous handler.
  if (PREV.sa_flags & SA_SIGINFO) {
    PREV.sa_sigaction(signal, info, data);
  } else if (PREV.sa_handler == SIG_DFL || PREV.sa_handler == SIG_IGN) {
    sigaction(signal, &PREV, NULL);
  } else {
    PREV.sa_handler(signal);
  }
}

void setup_signal_handler(void) {
  struct sigaction handler;
  handler.sa_sigaction = signal_handler;
  handler.sa_flags = SA_SIGINFO | SA_NODEFER | SA_ONSTACK;
  sigemptyset(&handler.sa_mask);
  int rc = sigaction(SIGSEGV, &handler, &PREV);
  assert(rc == 0);
}

void do_trap(void) {
  assert(MY_TRAP == 0);
  if (platform_setjmp(JMP)) {
    assert(MY_TRAP == 0);
  } else {
    MY_TRAP = 1;
    *(int*) 3 = 5;
    assert(0);
  }
}

I compiled this via go build. I then ran the resulting binary with ./gorepro. I changed the #if 0 to #if 1 as well and looked at the results.

What did you expect to see?

I expected both binaries to succeed, regardless of #if 0 or #if 1

What did you see instead?

The Go binary would also crash eventually (sometimes taking more time than others). The #if 1 case using sigsetjmp to recover would often crash much faster than the #if 0 case. In the #if 0 case, however, this wouldstill eventually crash.

Here's an example of the crashes:

`#if 0` - using `setjmp`
fatal: morestack on g0
SIGTRAP: trace trap
PC=0x4063b22 m=0 sigcode=1

goroutine 0 [idle]:
runtime: unexpected return pc for runtime.abort called from 0xc000009ee0
stack: frame={sp:0xc000009a68, fp:0xc000009a70} stack=[0x7ffeefb80758,0x7ffeefbff7c0)

runtime.abort()
        /usr/local/Cellar/go/1.15.8/libexec/src/runtime/asm_amd64.s:860 +0x2

goroutine 1 [syscall]:
runtime.cgocall(0x40aa030, 0xc00005bf10, 0xc000000001)
        /usr/local/Cellar/go/1.15.8/libexec/src/runtime/cgocall.go:133 +0x5b fp=0xc00005bee0 sp=0xc00005bea8 pc=0x400465b
main._Cfunc_do_trap()
        _cgo_gotypes.go:41 +0x45 fp=0xc00005bf10 sp=0xc00005bee0 pc=0x40a9ec5
main.main()
        /Users/acrichton/code/gorepro/main.go:13 +0x2f fp=0xc00005bf88 sp=0xc00005bf10 pc=0x40a9f6f
runtime.main()
        /usr/local/Cellar/go/1.15.8/libexec/src/runtime/proc.go:204 +0x209 fp=0xc00005bfe0 sp=0xc00005bf88 pc=0x4035329
runtime.goexit()
        /usr/local/Cellar/go/1.15.8/libexec/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00005bfe8 sp=0xc00005bfe0 pc=0x4063d01

rax    0x17
rbx    0xc000009a40
rcx    0x41723a0
rdx    0x0
rdi    0x2
rsi    0xc0000099e0
rbp    0xc000009aa0
rsp    0xc000009a68
r8     0x41723a0
r9     0x834d92209e0071b5
r10    0xc000009a40
r11    0x206
r12    0x834d92209e0071b5
r13    0xa
r14    0x200
r15    0x38
rip    0x4063b22
rflags 0x202
cs     0x2b
fs     0x0
gs     0x0
`#if 1` - using `sigsetsjmp`
signal 16 received but handler not on signal stack
fatal error: non-Go code set up signal handler without SA_ONSTACK flag

runtime stack:
runtime: unexpected return pc for runtime.sigtramp called from 0xc000058d88
stack: frame={sp:0xc000058910, fp:0xc000058920} stack=[0xc000050840,0xc000058c40)
000000c000058810:  000000c000058830  000000000404bfc5 <runtime.sigNotOnStack+133>
000000c000058820:  00000000040d5839  0000000000000039
000000c000058830:  000000c000058888  000000000404afa5 <runtime.adjustSignalStack+645>
000000c000058840:  0000000000000010  000000c000058870
000000c000058850:  000000c000058898  0000000004469000
000000c000058860:  000000000465ffff  000000c0000588b0
000000c000058870:  000000c000002000  0000000000008000
000000c000058880:  0000000000000001  000000c000058900
000000c000058890:  000000000404ac11 <runtime.sigtrampgo+369>  000000c000000010
000000c0000588a0:  0000000004166540  000000c0000588c0
000000c0000588b0:  000000c000058900  000000000401a513 <runtime.(*mcentral).cacheSpan+403>
000000c0000588c0:  0000000000000000  0000000000000000
000000c0000588d0:  0000000000000000  0000000000000000
000000c0000588e0:  0000000000000000  000000c000000180
000000c0000588f0:  000000c000058d88  000000c000058df0
000000c000058900:  000000c000058950  000000000406a393 <runtime.sigtramp+51>
000000c000058910: <000000c000000010 !000000c000058d88
000000c000058920: >000000c000058df0  000000c000058df0
000000c000058930:  c68bf6b46e1b0a17  000000000000000a
000000c000058940:  0000000000000200  0000000000000000
000000c000058950:  000000c000058960  00007fff20365d7d
000000c000058960:  000000c000058ed0  000000c000058a10
000000c000058970:  0000000000000000  0000000000000000
000000c000058980:  00000000000000de  0000000000000003
000000c000058990:  0000000004166540  0000000000000000
000000c0000589a0:  00000000040a92b0  004189374bc6a7ee
000000c0000589b0:  0000000000000001  0000000000000000
000000c0000589c0:  000000c000058ed0  000000c000058ea8
000000c0000589d0:  00000000ffffffff  0000000000000000
000000c0000589e0:  00000000fffffffb  0000000000000246
000000c0000589f0:  000000c000018090  000000000000000a
000000c000058a00:  0000000000000200  0000000000000000
000000c000058a10:  0000000004007db6 <runtime.cgocall+54>  0000000000000202
runtime.throw(0x40d5839, 0x39)
        /usr/local/go/src/runtime/panic.go:1117 +0x72
runtime.sigNotOnStack(0x10)
        /usr/local/go/src/runtime/signal_unix.go:918 +0x85
runtime.adjustSignalStack(0xc000000010, 0x4166540, 0xc0000588c0, 0xc000058900)
        /usr/local/go/src/runtime/signal_unix.go:509 +0x285
runtime.sigtrampgo(0xc000000010, 0xc000058d88, 0xc000058df0)
        /usr/local/go/src/runtime/signal_unix.go:449 +0x171

In Go 1.15 I saw different crashes with sigsetjmp than with setjmp, but in Go 1.16 it looks like both crash in the same manner.

edit: turns out I was testing the wrong binary, sigsetjmp and setjmp do indeed have separate crash signatures.


For some background this was originally reported as bytecodealliance/wasmtime-go#60. Wasmtime is a WebAssembly runtime where WebAssembly traps translate to ud2 on x86_64 platforms, raising a SIGILL. We were seeing trouble after recently switching from setjmp to sigsetjmp but I've seen crashes for quite some time even using setjmp (as seen here). I've tried to reduce this to not having a whole WebAssembly runtime and instead just having one C file to poke around. The C mirrors what the runtime currently does in a rough manner.

@cherrymui
Copy link
Member

Try setting GODEBUG=asyncpreemptoff=1. If a preemption signal occurs in the middle of a longjmp, it would not surprise me if something weird is going to happen.

(I'm not sure this kind of use of signal handler are supported by the Go runtime.)

@alexcrichton
Copy link
Author

Thanks for the tip! I also realized when testing that that I was testing the wrong program from earlier so I edited the description to say that setjmp and sigsetjmp have different crashes.

Testing out asyncpreemptoff=1 it looks like that does indeed fix the crash.

In that case is this something that's officially unsupported? Do we have no recourse to implement this form of a WebAssembly runtime in Go? Or are there perhaps other sycalls/etc we can do to protect against this?

@cherrymui
Copy link
Member

Thanks for the reply. Good to know it works.

In that case is this something that's officially unsupported?

By "this", you mean asyncpreemptoff=1, or longjmp-from-signal-handler? For the former, it is a documented debug flag, so it should do what it says. Technically debug flags are not subject to compatibility, so it may change in future releases, or not.

For the latter, I actually don't know. The Go runtime internally does a lot of stack switches. It would not be surprising if another context switch mechanism (e.g. setjmp/longjmp) works together with it. It may work, or not, I don't know. If it can be supported without much extra complexity, I think we could probably support it.

Or are there perhaps other sycalls/etc we can do to protect against this?

I guess you could block preemption signal (SIGURG) in the code. Perhaps other signals as well, e.g. profiling signals (SIGPROF).

@alexcrichton
Copy link
Author

Ah yes sorry to clarify I mean the longjmp or siglongjmp-from-signal-handler and whether that's supported. I'd personally like to know more about it and what's causing the crash and what we can do to prevent it. While we could block some signals on entry/exit into wasm it seems better to figure out a more precise solution if possible. For example why doesn't this crash happen on Linux when it happens on macOS?

@cherrymui
Copy link
Member

(Without looking into the detail) My guess it could be due to something weird of the longjmp and sigaltstack implementation on macOS (possibly bug). It seems the kernel delivers the signal on the user stack while we have sigaltstack set and the signal handler is registered with SA_ONSTACK. I can reproduce this in C.

#include <assert.h>
#include <setjmp.h>
#include <signal.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

#if 1
#define platform_jmp_buf sigjmp_buf
#define platform_setjmp(a) sigsetjmp(a, 0)
#define platform_longjmp(a, v) siglongjmp(a, v)
#else
#define platform_jmp_buf jmp_buf
#define platform_setjmp(a) setjmp(a)
#define platform_longjmp(a, v) longjmp(a, v)
#endif

struct sigaction PREV;
static int MY_TRAP = 0;
static platform_jmp_buf JMP;

static void signal_handler(int signal, siginfo_t *info, void *data) {
  int x;
  if (MY_TRAP && (signal == SIGSEGV)) {
    MY_TRAP = 0;
    printf("SIGSEGV sp=%p\n", &x);
    platform_longjmp(JMP, 1);
    assert(0);
  }

  // If we don't handle this delegate to the previous handler.
  if (PREV.sa_flags & SA_SIGINFO) {
    PREV.sa_sigaction(signal, info, data);
  } else if (PREV.sa_handler == SIG_DFL || PREV.sa_handler == SIG_IGN) {
    sigaction(signal, &PREV, NULL);
  } else {
    PREV.sa_handler(signal);
  }
}

void setup_signal_handler(void) {
  struct sigaction handler;
  handler.sa_sigaction = signal_handler;
  handler.sa_flags = SA_SIGINFO | SA_NODEFER | SA_ONSTACK;
  sigemptyset(&handler.sa_mask);
  int rc = sigaction(SIGSEGV, &handler, &PREV);
  assert(rc == 0);
}

void do_trap(void) {
  assert(MY_TRAP == 0);
  if (platform_setjmp(JMP)) {
    assert(MY_TRAP == 0);
  } else {
    MY_TRAP = 1;
    *(int*) 3 = 5;
    assert(0);
  }
}

int main() {
  int i;
  stack_t sigstk;
  sigstk.ss_sp = malloc(SIGSTKSZ);
  printf("sigaltstack at %p, sp=%p\n", sigstk.ss_sp, &i);
  sigstk.ss_size = SIGSTKSZ;
  sigstk.ss_flags = 0;
  sigaltstack(&sigstk, 0);
  setup_signal_handler();
  for (i = 0; ; i++) {
    printf("%d\n", i);
    do_trap();
  }
}

On macOS,

sigaltstack at 0x7fb21c500000, sp=0x7ffee30a1988
0
SIGSEGV sp=0x7fb21c51fa84
1
SIGSEGV sp=0x7ffee30a13d4
2
SIGSEGV sp=0x7ffee30a13d4
3
SIGSEGV sp=0x7ffee30a13d4
4
SIGSEGV sp=0x7ffee30a13d4

At the first iteration the signal is delivered on the sigaltstack, but afterwards it is delivered on the main stack.

Whereas on Linux,

sigaltstack at 0x558ea53e62a0, sp=0x7ffd929429ec
0
SIGSEGV sp=0x558ea53e7d6c
1
SIGSEGV sp=0x558ea53e7d6c
2
SIGSEGV sp=0x558ea53e7d6c
3
SIGSEGV sp=0x558ea53e7d6c
4
SIGSEGV sp=0x558ea53e7d6c

The signal is always delivered on the sigaltstack.

The Go runtime requires signals be delivered on the sigaltstack, and it crashes when it is delivered on the user stack.

@alexcrichton
Copy link
Author

Oh dear, signals not being delivered on the sigaltstack does indeed sound like a bug in macOS itself! Either that or intended behavior of sigsetjmp/siglongjmp, unsure.

I'd imagine that explains at least some of this, but if setjmp and longjmp are used (changing to #if 0) then this it looks like sigaltstack is indeed used to deliver all the signals, even on macOS, but it still crashes in Go?

@alexcrichton
Copy link
Author

According to macOS's published sources siglongjmp namely differs from longjmp in that it doesn't call a function called _sigunaltstack which is changing the kernel's idea of whether or not the sigaltstack is used. This means that the problem with using siglongjmp to exit the signal handler is that on the delivery of the next signal the kernel thinks the sigaltstack is still in use and delivers it on the same stack (which happens to be the main user stack).

That at least explains the problem with using sig{set,long}jmp, but I think we can work around that locally (we use the sig versions since they're much faster than the other versions as they don't save/restore signal masks, which we don't need). The workaround seems to be to actually return from the signal handle (so the _sigunaltstack function is automatically called) but updating the ucontext_t parameter to resume in a "landing pad" which serves to only longjmp.

Implementing that technique appears to fix the issue with Go as well, I'm unable to see any more runtime errors. Notably with this lib.c I'm not seeing any crashes in Go:

lib.c
#include <assert.h>
#include <setjmp.h>
#include <signal.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

#if 1
#define platform_jmp_buf sigjmp_buf
#define platform_setjmp(a) sigsetjmp(a, 0)
#define platform_longjmp(a, v) siglongjmp(a, v)
#else
#define platform_jmp_buf jmp_buf
#define platform_setjmp(a) setjmp(a)
#define platform_longjmp(a, v) longjmp(a, v)
#endif

struct sigaction PREV;
static int MY_TRAP = 0;
static platform_jmp_buf JMP;

static void do_jump(void) {
    platform_longjmp(JMP, 1);
    assert(0);
}

static void signal_handler(int signal, siginfo_t *info, void *data) {
  if (MY_TRAP) {
    MY_TRAP = 0;
    ((ucontext_t*)data)->uc_mcontext->__ss.__rip = (uint64_t) do_jump;
    ((ucontext_t*)data)->uc_mcontext->__ss.__rsp -= 8;
    return;
  }

  // If we don't handle this delegate to the previous handler.
  if (PREV.sa_flags & SA_SIGINFO) {
    PREV.sa_sigaction(signal, info, data);
  } else if (PREV.sa_handler == SIG_DFL || PREV.sa_handler == SIG_IGN) {
    sigaction(signal, &PREV, NULL);
  } else {
    PREV.sa_handler(signal);
  }
}

void setup_signal_handler(void) {
  struct sigaction handler;
  handler.sa_sigaction = signal_handler;
  handler.sa_flags = SA_SIGINFO | SA_NODEFER | SA_ONSTACK;
  sigemptyset(&handler.sa_mask);
  int rc = sigaction(SIGSEGV, &handler, &PREV);
  assert(rc == 0);
}

void do_trap(void) {
  assert(MY_TRAP == 0);
  if (platform_setjmp(JMP)) {
    assert(MY_TRAP == 0);
  } else {
    MY_TRAP = 1;
    *(int*) 3 = 5;
    assert(0);
  }
}

While it still seems odd that the usage of {long,set}jmp would cause the Go crashes above I'll probably work to implement this fix which will at least make the original issue moot on my end.

@ianlancetaylor
Copy link
Contributor

Calling longjmp to get out of a signal handler is unsafe. A signal can occur at any time. If the Go program happens to be manipulating some runtime data structure or holding a runtime lock at the time that the signal occurs, and the signal handler calls longjmp to jump over the Go code, then the program will fail. Your program is failing without GODEBUG=asyncpreemptoff=1 because by default Go programs generate a lot of signals, making this failure more likely. But the failure can still happen on any signal.

I'm going to close this issue because I think the request is impossible to implement in Go. Please comment if you disagree.

@alexcrichton
Copy link
Author

I don't think it's correct to say that longjmp out of a signal handler is always unsafe. It's certainly quite frequently unsafe, and I agree that for async signals or longjmp at arbitrary points it is indeed unsafe, but my use case is a JIT compiler using longjmp to recover from faults in wasm. This means that 100% of the code between the longjmp invocations is controlled by the JIT runtime and signals are filtered so we don't try to longjmp from anything arbitrary, only known locations in JIT code that can fault (and with known signals).

This is not an issue about Go trying to longjmp. It's also not about dealing with Go code at all, the longjmp never crosses Go code and only crosses JIT code. I tried to make this clear in the example above (since the longjmp is entirely self-contained and doesn't cross Go at all). I didn't think it would be too useful have an example with the whole JIT compiler.

FWIW I'm having pretty poor experience reporting issues here having them be quickly deemed not bugs in Go. I tried to do my fair share in reducing, investigating, and documenting how to reproduce this issue. I did so in a different issue which was also deemed not a bug in Go, then was later fixed. I feel like I'm just being brushed off here as if I can't possibly know what I'm doing with signals, and I'm not really sure how to productively work with that.

@ianlancetaylor
Copy link
Contributor

My apologies for my incorrect assumption. I jumped from the fact that setting GODEBUG=asyncpreemptoff=1 fixed the problem to assuming that your code was doing something wrong.

Still, this doesn't seem to be a bug in the Go runtime, and I don't see anything that the Go runtime could do to fix it.

@golang golang locked and limited conversation to collaborators Feb 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants