-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: crash when C library resets sigaltstack/sigaction settings #7227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
FrozenDueToAge
OS-Darwin
Suggested
Issues that may be good for new contributors looking for work to do.
Milestone
Comments
13:15 /tmp $ GOTRACEBACK=2 go run main.go 2014/01/31 13:16:12 here fatal error: runtime: stack split during syscall runtime stack: runtime.throw(0x42740cc) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/panic.c:464 +0x69 runtime.newstack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/stack.c:261 +0x6c3 runtime.morestack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/asm_amd64.s:225 +0x61 goroutine 4 [stack split]: runtime: unexpected return pc for runtime.sighandler called from 0x7fff8aee85aa runtime.sighandler(0x6, 0x0, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/signal_amd64.c:43 fp=0x83bcea0 created by github.com/niemeyer/qml.Init /Users/quarnster/code/go/src/github.com/niemeyer/qml/qml.go:58 +0xa0 goroutine 1 [chan receive]: runtime.park(0x404db20, 0xc210038110, 0x427322d) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1342 +0x66 runtime.chanrecv(0x40f6960, 0xc2100380c0, 0x83a1e08, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/chan.c:354 +0x50b runtime.chanrecv1(0x40f6960, 0xc2100380c0) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/chan.c:446 +0x38 github.com/niemeyer/qml.gui(0xc210048140) /Users/quarnster/code/go/src/github.com/niemeyer/qml/bridge.go:69 +0xbd github.com/niemeyer/qml.(*Common).Create(0xc21000a310, 0x0, 0x1a, 0x8211190) /Users/quarnster/code/go/src/github.com/niemeyer/qml/qml.go:636 +0x117 main.main() /private/tmp/main.go:33 +0x28a runtime.main() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:220 +0x11f runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1394 goroutine 2 [syscall]: runtime.notetsleepg(0x83bff60, 0xdf8475800) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/lock_sema.c:254 +0x71 runtime.MHeap_Scavenger() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mheap.c:463 +0xa3 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1394 created by runtime.main /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:179 goroutine 3 [syscall]: runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1394 exit status 2 13:15 /tmp $ go version go version go1.2 darwin/amd64 |
No strace on OSX, but attached output of "sudo dtruss -f -a -s ./main". sigaltstack does indeed appear to be called. Unfortunately, if I read the output correctly, it's called by Apple's system library CoreText. Attachments:
|
And if it's of any use, I'm on 10.9.1 and: 17:14 ~/code/go/src/github.com/niemeyer/qml $ git log -1 commit 9c937b147a7ce9fc9560593fa7d6e2ae49d8203b Author: Gustavo Niemeyer <[email protected]> Date: Wed Jan 29 16:17:41 2014 -0200 Use pre-resolved propIndex when setting property. 17:14 ~/code/go/src/github.com/niemeyer/qml $ brew info qt5 gcc48 | grep /usr /usr/local/Cellar/qt5/5.0.2 (2899 files, 119M) /usr/local/Cellar/qt5/5.2.0 (5482 files, 172M) * .app bundles were installed to /usr/local/Cellar/qt5/5.2.0 (or libexec). /usr/local/Cellar/gcc48/4.8.1 (965 files, 90M) * 17:14 ~/code/go/src/github.com/niemeyer/qml $ go version go version go1.2 darwin/amd64 17:14 ~/code/go/src/github.com/niemeyer/qml $ |
So if sigaltstack is to blame, I guess that makes this issue a dup of 4216 (and 5287), which has been closed with a "unfortunate" label. Is there really nothing that can be done? Would using kqueue aid in any way? From what I understand kqueue can be used to "catch" signals without "owning" the signal handler: http://doc.geoffgarside.co.uk/kqueue/signal.html. |
I think SIGABRT is a red herring, here's one for SYS_OPEN: 014/02/01 13:11:54 start 2014/02/01 13:11:54 6 <nil> 2014/02/01 13:11:54 hello 2014/02/01 13:11:54 i: 0 fatal error: runtime: stack split during syscall runtime stack: runtime.throw(0x427a1ec) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/panic.c:464 +0x69 runtime.newstack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/stack.c:261 +0x6c3 runtime.morestack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/asm_amd64.s:225 +0x61 goroutine 5 [stack split]: runtime: unexpected return pc for runtime.sighandler called from 0x7fff8aee85aa runtime.sighandler(0x2, 0x0, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/signal_amd64.c:43 fp=0x6cbae70 created by github.com/niemeyer/qml.Init /Users/quarnster/code/go/src/github.com/niemeyer/qml/qml.go:58 +0xa0 goroutine 1 [chan receive]: github.com/niemeyer/qml.gui(0xc210049140) /Users/quarnster/code/go/src/github.com/niemeyer/qml/bridge.go:69 +0xbd github.com/niemeyer/qml.(*Common).Create(0xc21000a310, 0x0, 0x1a, 0x6b11190) /Users/quarnster/code/go/src/github.com/niemeyer/qml/qml.go:636 +0x117 main.main() /private/tmp/main.go:57 +0x29c goroutine 3 [syscall]: os/signal.loop() /Users/quarnster/code/3rdparty/go/src/pkg/os/signal/signal_unix.go:21 +0x1e created by os/signal.init·1 /Users/quarnster/code/3rdparty/go/src/pkg/os/signal/signal_unix.go:27 +0x31 goroutine 4 [syscall]: runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1394 goroutine 6 [chan receive]: main.blah() /private/tmp/main.go:19 +0xdb created by main.main /private/tmp/main.go:56 +0x27d exit status 2 I changed the code like this: func (t *tt) Test() { cmd := exec.Command("echo", "hello") p, _ := cmd.StdoutPipe() cmd.Start() data := make([]byte, 1024) log.Println(p.Read(data)) log.Println(string(data)) log.Println("But not here") } The reading of the pipe, printing the result and the string converted data always succeeds. The "But not here" is seldom printed and we get a stack trace like the above instead, other times it is printed but then the program doesn't resume. If I change it to: func (t *tt) Test() { for i := 0; i < 1000000; i++ { log.Println("But not here", i) } } It runs just fine. |
Sorry I was referring to the 2, which is indeed SIGINT, don't know where I got SYS_WRITE from. GDB 7.6.2 has been close to useless for me, but LLDB appears to be able to do proper symbol lookup for backtraces and setting breakpoints in for example sigaction. With the help of lldb I've been able to track down that the actual signal that we are receiving is neither SIGINT nor SIGABRT, but SIGCHLD when the process exits which makes a lot more sense. The specifics of registers, callstack and reason for stopping differs from run to run, but I think the important thing here is to note that the address Go complains about "runtime: unexpected return pc for runtime.sighandler called from 0x7fff8aee85aa" matches up with "libsystem_platform.dylib`_sigtramp", and the argument confusion as reported in the GO traceback (or the EXC_BAD_ACCESS as reported in the attached lldb.txt) is likely a side effect of mismatched signal handler expectations. Guess what the one and only sigaction backtrace caught by lldb leads to? https://qt.gitorious.org/qt/qt/source/663b742ca8b289e6456facf8b6a8ca18a4157fb7:src/corelib/io/qprocess_unix.cpp#L224 It's not the only one *possible* though: 17:24 ~/code/3rdparty/qt5/qtbase/src $ ag sigaction corelib/io/qprocess_unix.cpp 123:static struct sigaction qt_sa_old_sigchld_handler; 124:static void qt_sa_sigchld_sigaction(int signum, siginfo_t *info, void *context) 139: volatile struct sigaction *vsa = &qt_sa_old_sigchld_handler; 142: void (*oldAction)(int, siginfo_t *, void *) = vsa->sa_sigaction; 218: struct sigaction action; 222: ::sigaction(SIGCHLD, NULL, &action); 223: action.sa_sigaction = qt_sa_sigchld_sigaction; 227: ::sigaction(SIGCHLD, &action, &qt_sa_old_sigchld_handler); 249: struct sigaction currentAction; 250: ::sigaction(SIGCHLD, 0, ¤tAction); 251: if (currentAction.sa_sigaction == qt_sa_sigchld_sigaction) { 252: ::sigaction(SIGCHLD, &qt_sa_old_sigchld_handler, 0); 1447: struct sigaction noaction; 1450: ::sigaction(SIGPIPE, &noaction, 0); 1489: struct sigaction noaction; 1492: ::sigaction(SIGPIPE, &noaction, 0); 1500: struct sigaction noaction; 1503: ::sigaction(SIGPIPE, &noaction, 0); corelib/kernel/qcore_unix_p.h 156: struct sigaction noaction; 159: ::sigaction(SIGPIPE, &noaction, 0); corelib/kernel/qcrashhandler.cpp 413: struct sigaction SignalAction; 417: sigaction(SIGSEGV, &SignalAction, NULL); 418: sigaction(SIGBUS, &SignalAction, NULL); corelib/kernel/qfunctions_nacl.cpp 118:int sigaction(int, const struct sigaction *, struct sigaction *) corelib/kernel/qfunctions_nacl.h 79:int sigaction(int sig, const struct sigaction * act, struct sigaction * oact); testlib/qtestcase.cpp 2009: struct sigaction act; 2010: memset(&act, 0, sizeof(struct sigaction)); 2012: sigaction(signum, &act, NULL); 2024: struct sigaction act; 2038: struct sigaction oldact; 2041: sigaction(fatalSignals[i], &act, &oldact); 2047: sigaction(fatalSignals[i], &oldact, 0); 2059: struct sigaction act; 2063: struct sigaction oldact; 2068: sigaction(i, &act, &oldact); 2072: sigaction(i, &oldact, 0); If I understand https://qt.gitorious.org/qt/qt/source/663b742ca8b289e6456facf8b6a8ca18a4157fb7:src/corelib/io/qprocess_unix.cpp#L131 correctly, it tries to be good and call the old handler. Backtrace for __sigaltstack comes from some code in CarbonCore making use of setjmp: (lldb) b __sigaltstack Breakpoint 2: where = libsystem_kernel.dylib`__sigaltstack, address = 0x0000000000015c58 (lldb) r Process 54319 launched: './main' (x86_64) Process 54319 stopped * thread #1: tid = 0xf6a1b, 0x00007fff8ea9fc58 libsystem_kernel.dylib`__sigaltstack, queue = 'com.apple.main-thread, stop reason = breakpoint 2.1 frame #0: 0x00007fff8ea9fc58 libsystem_kernel.dylib`__sigaltstack libsystem_kernel.dylib`__sigaltstack: -> 0x7fff8ea9fc58: movl $33554485, %eax 0x7fff8ea9fc5d: movq %rcx, %r10 0x7fff8ea9fc60: syscall 0x7fff8ea9fc62: jae 0x7fff8ea9fc6c ; __sigaltstack + 20 (lldb) bt * thread #1: tid = 0xf6a1b, 0x00007fff8ea9fc58 libsystem_kernel.dylib`__sigaltstack, queue = 'com.apple.main-thread, stop reason = breakpoint 2.1 frame #0: 0x00007fff8ea9fc58 libsystem_kernel.dylib`__sigaltstack frame #1: 0x00007fff8aee852c libsystem_platform.dylib`setjmp + 48 (lldb) b setjmp Breakpoint 4: where = libsystem_platform.dylib`setjmp, address = 0x00007fff8aee84fc (lldb) r There is a running process, kill it and restart?: [Y/n] y Process 54322 launched: './main' (x86_64) Process 54322 stopped * thread #1: tid = 0xf6a72, 0x00007fff8aee84fc libsystem_platform.dylib`setjmp, queue = 'com.apple.main-thread, stop reason = breakpoint 4.1 frame #0: 0x00007fff8aee84fc libsystem_platform.dylib`setjmp libsystem_platform.dylib`setjmp: -> 0x7fff8aee84fc: pushq %rdi 0x7fff8aee84fd: movl $1, %edi 0x7fff8aee8502: xorq %rsi, %rsi 0x7fff8aee8505: subq $16, %rsp (lldb) bt * thread #1: tid = 0xf6a72, 0x00007fff8aee84fc libsystem_platform.dylib`setjmp, queue = 'com.apple.main-thread, stop reason = breakpoint 4.1 frame #0: 0x00007fff8aee84fc libsystem_platform.dylib`setjmp frame #1: 0x00007fff8ec0f208 CarbonCore`_sfInvokeFlipper + 71 frame #2: 0x00007fff8ec0eabb CarbonCore`CoreEndianFlipData + 121 frame #3: 0x00007fff8ec0e983 CarbonCore`GetResourcePtrCommon + 506 frame #4: 0x00007fff8ec11b28 CarbonCore`RMGetIndexedResource + 42 frame #5: 0x00007fff8d56ceec libFontParser.dylib`TResourceForkFileReference::GetIndexedResource(unsigned int, unsigned int, short*, unsigned long*, unsigned char*) const + 54 frame #6: 0x00007fff8d56ce76 libFontParser.dylib`TResourceFileDataReference::TResourceFileDataReference(TResourceForkSurrogate const&, unsigned int, unsigned int) + 158 frame #7: 0x00007fff8d56cd66 libFontParser.dylib`TResourceFileDataSurrogate::TResourceFileDataSurrogate(TResourceForkSurrogate const&, unsigned int, unsigned int) + 66 frame #8: 0x00007fff8d56655a libFontParser.dylib`TFont::CreateFontEntitiesForFile(char const*, bool, TSimpleArray<TFont*>&, bool, short, char const*) + 598 frame #9: 0x00007fff8d565e1f libFontParser.dylib`FPFontCreateFontsWithPath + 253 frame #10: 0x00007fff87b4e1f4 libCGXType.A.dylib`create_private_data_with_path + 19 frame #11: 0x00007fff8b833569 CoreGraphics`CGFontCreateFontsWithPath + 40 frame #12: 0x00007fff8b83317e CoreGraphics`CGFontCreateFontsWithURL + 383 frame #13: 0x00007fff866145dc CoreText`CreateFontWithFontURL(__CFURL const*, bool) + 60 frame #14: 0x00007fff866143fb CoreText`TCGFontCache::CopyFont(__CFURL const*, bool) const + 91 frame #15: 0x00007fff86614225 CoreText`TBaseFont::CopyNativeFont() const + 69 frame #16: 0x00007fff866141a6 CoreText`TBaseFont::CopyGraphicsFont() const + 26 frame #17: 0x00007fff86613ebc CoreText`TBaseFont::CopyTable(unsigned int) const + 188 frame #18: 0x00007fff86617b91 CoreText`TBaseFont::GetCmapTable() const + 57 frame #19: 0x00007fff866179ca CoreText`TBaseFont::GetUnicodeEncoding() const + 58 frame #20: 0x00007fff866178ff CoreText`TBaseFont::GetGlyphsForCharacterRange(CFRange, unsigned short*) const + 39 frame #21: 0x00007fff86626ed2 CoreText`TBMPDataCachePage::TBMPDataCachePage(TBaseFont const&, unsigned short) + 132 frame #22: 0x00007fff86626df4 CoreText`TBMPDataCache::PageForCharacter(unsigned short) const + 108 frame #23: 0x00007fff86626c35 CoreText`CTFontGetGlyphsForCharacters + 255 frame #24: 0x00007fff84e2423a AppKit`-[__NSFontTypefaceInfo _latin1MappingTableWithPlatformFont:hasKernPair:] + 336 frame #25: 0x00007fff84e24064 AppKit`-[NSFont _latin1MappingTable:] + 125 frame #26: 0x00007fff854587fa AppKit`+[NSStringDrawingTextStorage _fastDrawString:attributes:length:inRect:graphicsContext:baselineRendering:usesFontLeading:usesScreenFont:typesetterBehavior:paragraphStyle:lineBreakMode:boundingRect:padding:scrollable:baselineOffset:] + 895 frame #27: 0x00007fff84f41c92 AppKit`_NSStringDrawingCore + 1495 frame #28: 0x00007fff8506cbee AppKit`-[NSString(NSStringDrawing) drawInRect:withAttributes:] + 183 frame #29: 0x0000000006f586ef libqcocoa.dylib`QCoreTextFontDatabase::QCoreTextFontDatabase() + 1071 frame #30: 0x0000000006f07161 libqcocoa.dylib`QCocoaIntegration::QCocoaIntegration() + 49 frame #31: 0x0000000006f06004 libqcocoa.dylib`QCocoaIntegrationPlugin::create(QString const&, QStringList const&) + 148 frame #32: 0x0000000004e06306 QtGui`QPlatformIntegrationFactory::create(QString const&, QStringList const&, int&, char**, QString const&) + 198 frame #33: 0x0000000004e0fcc9 QtGui`QGuiApplicationPrivate::createPlatformIntegration() + 1257 frame #34: 0x0000000004e108ab QtGui`QGuiApplicationPrivate::createEventDispatcher() + 27 frame #35: 0x0000000005a37638 QtCore`QCoreApplication::init() + 104 frame #36: 0x0000000005a375b7 QtCore`QCoreApplication::QCoreApplication(QCoreApplicationPrivate&) + 39 frame #37: 0x0000000004e0d968 QtGui`QGuiApplication::QGuiApplication(int&, char**, int) + 200 frame #38: 0x00000000040029fe main`newGuiApplication + 46 frame #39: 0x0000000004066864 main`runtime.asmcgocall + 84 Anyway, I'm rambling. SIGCHLD certainly would be considered a signal that can be moved to kqueue or similar, right? Attachments:
|
FYI, on FreeBSD (but no mention about this for OSX) SIGCHLD is one of the signals not forwarded via the kqueue signal filter, however, and perhaps more correct, the PROC filter can be used instead. http://www.freebsd.org/cgi/man.cgi?query=kqueue https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/kqueue.2.html Also for reference, the golang-dev topic is: https://groups.google.com/forum/#!topic/golang-dev/RTW3e8OvofQ |
FYI stubbing the problematic functions out by monkey-patching appears to work as a workaround on darwin and linux (amd64): http://play.golang.org/p/Fl2eAdMmEL The issue never showed up on linux in the first place, however issue #5287 did and it is "fixed" when applying a monkey patch for sigaltstack like the linked playground snippet. So perhaps creating stubs returning some error code that makes sense would be the quickest placeholder solution for this issue and others like it until (or perhaps if?) a more satisfying solution surfaces. |
What of using Go in the context of issue #256 and issue #2790? |
If somebody can propose how to fix this, that would be great. But there are three signals that the Go runtime must catch itself in order to correctly implement the Go language spec: SIGSEGV, SIGBUS, SIGFPE. Those signals are delivered synchronously to the thread that triggered them, and Go relies on that so that it knows where the problem arose. We can't use kqueue for those signals, because we would not be able to deliver them to the correct goroutine. Go code normally runs on a very small stack, too small for the signal handler. Therefore, Go uses sigaltstack to install an alternate signal stack so that there is enough stack space to handle the signal. So when Go code calls C code that disables the alternate stack, and one of those signals arises, the program is going to fail. As an aside, why would a C library disable the signal alternate stack? What kind of sense does that make? If the library is expected to be called by other programs, it sure seems like a bug in the library. Anyhow, if anybody can propose a fix that meets the constraints, please do so. But it's not enough to simply suggest kqueue. That is not the solution to this problem. Labels changed: added repo-main. |
> But there are three signals that the Go runtime must catch itself in order to correctly implement the Go language spec: SIGSEGV, SIGBUS, SIGFPE. And thus SIGCHLD which is the signal we receive here must make the program crash? Also, please point me to where in the language spec these requirements are mentioned as I'm having trouble locating them. > As an aside, why would a C library disable the signal alternate stack? What kind of sense does that make? If the library is expected to be called by other programs, it sure seems like a bug in the library. Is that what it does though? I think it calls setjmp which in turn calls sigaltstack. |
>> But there are three signals that the Go runtime must catch itself in order to correctly implement the Go language spec: SIGSEGV, SIGBUS, SIGFPE. > > And thus SIGCHLD which is the signal we receive here must make the program crash? > > Also, please point me to where in the language spec these requirements are mentioned as I'm having trouble locating them. You know, I'm actually trying to help here. I started out by saying "If somebody can propose how to fix this, that would be great." Then I explained the problem as accurately as I can. Throwing your hands up in the air and implying that things should be other than as they are may make you feel better but it does not advance us toward a solution. To answer your questions: obviously SIGCHLD should not make the program crash. Crashing the program is not the intent. The language spec requirements on the three signals I mentioned can be found at http://golang.org/ref/spec#Arithmetic_operators and http://golang.org/ref/spec#Address_operators. Look for the reference to a run-time panic. In order to implement those language requirements efficiently, the current implementation relies on catching the signals I mentioned. >> As an aside, why would a C library disable the signal alternate stack? What kind of sense does that make? If the library is expected to be called by other programs, it sure seems like a bug in the library. > > Is that what it does though? I think it calls setjmp which in turn calls sigaltstack. In that case setjmp is the C library function in question. In GNU glibc setjmp does not call sigaltstack; why would it? I agree that your backtrace appears to show setjmp calling sigaltstack, but I don't understand why it would do so. The point of the setjmp function has nothing to do with sigaltstack. I found one possible source of the Darwin setjmp function at http://www.opensource.apple.com/source/Libc/Libc-825.40.1/x86_64/sys/_setjmp.s , but it does not call sigaltstack. You showed the first four instructions of setjmp in comment #20; it might be interesting to see the whole function. |
Thank you for your patience and please pardon the lack of mine. I've been able to create a smaller standalone program that triggers this issue and which does not invoke sigaltstack (and not setjmp neither), so I believe that path is a red herring. Please see http://play.golang.org/p/d4YCDD7Qa6 which in short overrides SIGCHLD via sigaction, but also calls into the old signal handler from the new one. Output of running that piece of code on my machine is: 17:38 ~/code/go/src/7227 $ go version go version devel +83227883e5d0 Thu Mar 13 19:04:00 2014 +0400 darwin/amd64 17:38 ~/code/go/src/7227 $ GOTRACEBACK=2 go run 7227.go here I'm evil fatal error: runtime: stack split during syscall runtime stack: runtime.throw(0x41abf5e) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/panic.c:519 +0x69 runtime.newstack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/stack.c:641 +0x7a3 runtime.morestack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/asm_amd64.s:228 +0x61 goroutine 16 [stack split]: syscall.Syscall6(0x7, 0x188, 0xc208021bd4, 0x0, 0xc2080423f0, ...) /Users/quarnster/code/3rdparty/go/src/pkg/syscall/asm_darwin_amd64.s:41 +0x5 fp=0xc208021b38 syscall.wait4(0x188, 0xc208021bd4, 0x0, 0xc2080423f0, 0x4021f62, ...) /Users/quarnster/code/3rdparty/go/src/pkg/syscall/zsyscall_darwin_amd64.go:32 +0x95 fp=0xc208021b98 syscall.Wait4(0x188, 0xc208021c1c, 0x0, 0xc2080423f0, 0xc208016280, ...) /Users/quarnster/code/3rdparty/go/src/pkg/syscall/syscall_bsd.go:129 +0x76 fp=0xc208021be0 os.(*Process).wait(0xc208050320, 0x0, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec_unix.go:22 +0x121 fp=0xc208021c98 os.(*Process).Wait(0xc208050320, 0xc208050000, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/doc.go:45 +0x39 fp=0xc208021cc0 os/exec.(*Cmd).Wait(0xc208052140, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:320 +0x1bd fp=0xc208021d78 os/exec.(*Cmd).Run(0xc208052140, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:237 +0x78 fp=0xc208021da8 os/exec.(*Cmd).CombinedOutput(0xc208052140, 0x0, 0x0, 0x0, 0x0, ...) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:364 +0x269 fp=0xc208021e28 main.main() /Users/quarnster/code/go/src/7227/7227.go:54 +0x237 fp=0xc208021f50 runtime.main() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:243 +0x11a fp=0xc208021fa8 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 fp=0xc208021fb0 created by _rt0_go /Users/quarnster/code/3rdparty/go/src/pkg/runtime/asm_amd64.s:97 +0x120 goroutine 17 [runnable]: runtime.MHeap_Scavenger() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mheap.c:507 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 created by runtime.main /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:203 goroutine 18 [runnable]: bgsweep() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mgc0.c:1891 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 created by runtime.gc /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mgc0.c:2179 goroutine 19 [runnable]: os/signal.loop() /Users/quarnster/code/3rdparty/go/src/pkg/os/signal/signal_unix.go:19 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 created by os/signal.init·1 /Users/quarnster/code/3rdparty/go/src/pkg/os/signal/signal_unix.go:27 +0x32 goroutine 17 [syscall]: runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 goroutine 20 [runnable]: os/exec.func·004(0xc2080500c0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:283 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 created by os/exec.(*Cmd).Start /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:285 +0x8db exit status 2 17:38 ~/code/go/src/7227 $ In case setjmp is still of interest, the disassembly of it on my machine matches this version exactly: http://www.opensource.apple.com/source/Libc/Libc-825.40.1/x86_64/sys/setjmp.s |
Cleaning up unneeded imports just to make the repro code shorter: http://play.golang.org/p/sS2J9w-0E2 |
Thanks for the test case. The signal handler that you install 1) does not set sa_mask; 2) does not set SA_ONSTACK. When I set both of those, the code works fine. http://play.golang.org/p/1idbOzCTUu As I described above, Go requires that signals be delivered on the alternate signal stack set up by sigaltstack. I forgot to say that when using sigaction you need to use the SA_ONSTACK flag. When C code sets up a signal handler that does not use SA_ONSTACK, and when that signal is delivered to a thread that is running Go code, the program can crash. I don't know how to fix this. The Go code does not in general have a large enough stack to run a signal handler. |
Hmm, your amended version doesn't run here, though the error I'm receiving changed for darwin amd64, but darwin 386 is still the same. BTW I tried the version in #35 on Linux which worked fine for both amd64 and 386 with go 1.2. But ignoring that for now, how much stack space would be required to check if we've got enough stack space? 20:59 ~/code/go/src/7227 $ cat 7227.go package main /* #include #include #include struct sigaction old_action; void evil_c_sig_handler(int signum, siginfo_t *info, void *context) { // First do my own stuff printf("I'm evil\n"); // Then call the old signal handler volatile struct sigaction *vsa = &old_action; if (old_action.sa_flags & SA_SIGINFO) { void (*oldAction)(int, siginfo_t *, void *) = vsa->sa_sigaction; if (oldAction) oldAction(signum, info, context); } else { void (*oldAction)(int) = vsa->sa_handler; if (oldAction && oldAction != SIG_IGN) oldAction(signum); } } void test() { struct sigaction action; sigaction(SIGCHLD, NULL, &action); memset(&action, 0, sizeof action); sigfillset(&action.sa_mask); action.sa_sigaction = evil_c_sig_handler; action.sa_flags = SA_NOCLDSTOP | SA_SIGINFO | SA_ONSTACK; sigaction(SIGCHLD, &action, &old_action); } */ import "C" import ( "os/exec" ) func main() { C.test() for { cmd := exec.Command("echo", "hello") cmd.CombinedOutput() cmd.Wait() } } 21:00 ~/code/go/src/7227 $ GOTRACEBACK=2 go run 7227.go I'm evil runtime: newstack framesize=0x0 argsize=0x20 sp=0xc20800baa0 stack=[0xc208039000, 0xc208039fa8] morebuf={pc:0x7fff87db85aa sp:0xc20800bab0 lr:0x0} sched={pc:0x4016ca0 sp:0xc20800baa8 lr:0x0 ctxt:0x0} runtime: split stack overflow: 0xc20800baa0 < 0xc208039000 fatal error: runtime: split stack overflow runtime stack: runtime.throw(0x40fc59b) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/panic.c:519 +0x69 runtime.newstack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/stack.c:627 +0x1eb runtime.morestack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/asm_amd64.s:228 +0x61 goroutine 16 [stack split]: syscall.Syscall6(0x7, 0x1303, 0xc208039c84, 0x0, 0xc20804c000, ...) /Users/quarnster/code/3rdparty/go/src/pkg/syscall/asm_darwin_amd64.s:41 +0x5 fp=0xc208039be8 syscall.wait4(0x1303, 0xc208039c84, 0x0, 0xc20804c000, 0x401fae2, ...) /Users/quarnster/code/3rdparty/go/src/pkg/syscall/zsyscall_darwin_amd64.go:32 +0x95 fp=0xc208039c48 syscall.Wait4(0x1303, 0xc208039ccc, 0x0, 0xc20804c000, 0xc2080161e0, ...) /Users/quarnster/code/3rdparty/go/src/pkg/syscall/syscall_bsd.go:129 +0x76 fp=0xc208039c90 os.(*Process).wait(0xc208048320, 0x0, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec_unix.go:22 +0x121 fp=0xc208039d48 os.(*Process).Wait(0xc208048320, 0xc208048000, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/doc.go:45 +0x39 fp=0xc208039d70 os/exec.(*Cmd).Wait(0xc20804a140, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:320 +0x1bd fp=0xc208039e28 os/exec.(*Cmd).Run(0xc20804a140, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:237 +0x78 fp=0xc208039e58 os/exec.(*Cmd).CombinedOutput(0xc20804a140, 0x0, 0x0, 0x0, 0x0, ...) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:364 +0x269 fp=0xc208039ed8 main.main() /Users/quarnster/code/go/src/7227/7227.go:49 +0xc0 fp=0xc208039f50 runtime.main() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:243 +0x11a fp=0xc208039fa8 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 fp=0xc208039fb0 created by _rt0_go /Users/quarnster/code/3rdparty/go/src/pkg/runtime/asm_amd64.s:97 +0x120 goroutine 17 [syscall]: runtime.notetsleepg(0x41c8f68, 0xdf8475800) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/lock_sema.c:263 +0x71 runtime.MHeap_Scavenger() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mheap.c:531 +0xa3 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 created by runtime.main /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:203 goroutine 18 [GC sweep wait]: runtime.park(0x40127e0, 0x4115248, 0x40fd1ac) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1370 +0x89 runtime.parkunlock(0x4115248, 0x40fd1ac) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1386 +0x3b bgsweep() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mgc0.c:1910 +0xc2 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 created by runtime.gc /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mgc0.c:2179 goroutine 17 [syscall]: runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 exit status 2 21:00 ~/code/go/src/7227 $ 21:01 ~/code/go/src/7227 $ CGO_ENABLED=1 GOARCH=386 GOTRACEBACK=2 go run 7227.go I'm evil fatal error: runtime: stack split during syscall runtime stack: runtime.throw(0x40cd9de) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/panic.c:519 +0x5f runtime.newstack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/stack.c:641 +0x631 runtime.morestack() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/asm_386.s:246 +0x5e goroutine 16 [stack split]: syscall.Syscall6(0x139f, 0x4169e0c, 0x0, 0x151280f0, 0x0, ...) /Users/quarnster/code/3rdparty/go/src/pkg/syscall/asm_darwin_386.s:41 +0x5 fp=0x4169dbc syscall.wait4(0x139f, 0x4169e0c, 0x0, 0x151280f0, 0x401cf0d, ...) /Users/quarnster/code/3rdparty/go/src/pkg/syscall/zsyscall_darwin_386.go:32 +0x85 fp=0x4169df0 syscall.Wait4(0x139f, 0x4169e30, 0x0, 0x151280f0, 0x1512c2c0, ...) /Users/quarnster/code/3rdparty/go/src/pkg/syscall/syscall_bsd.go:129 +0x68 fp=0x4169e14 os.(*Process).wait(0x1512a170, 0x0, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec_unix.go:22 +0xd9 fp=0x4169e78 os.(*Process).Wait(0x1512a170, 0x0, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/doc.go:45 +0x38 fp=0x4169e8c os/exec.(*Cmd).Wait(0x15102820, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:320 +0x142 fp=0x4169ef4 os/exec.(*Cmd).Run(0x15102820, 0x0, 0x0) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:237 +0x6a fp=0x4169f0c os/exec.(*Cmd).CombinedOutput(0x15102820, 0x0, 0x0, 0x0, 0x0, ...) /Users/quarnster/code/3rdparty/go/src/pkg/os/exec/exec.go:364 +0x1d5 fp=0x4169f5c main.main() /Users/quarnster/code/go/src/7227/7227.go:49 +0x95 fp=0x4169f9c runtime.main() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:243 +0xfa fp=0x4169fd0 runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 fp=0x4169fd4 created by _rt0_go /Users/quarnster/code/3rdparty/go/src/pkg/runtime/asm_386.s:101 +0xf7 goroutine 17 [syscall]: runtime.notetsleepg(0x4178fa0, 0xf8475800, 0xd) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/lock_sema.c:263 +0x66 runtime.MHeap_Scavenger() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mheap.c:531 +0xbc runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 created by runtime.main /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:203 goroutine 18 [GC sweep wait]: runtime.park(0x4010c00, 0x40ddd00, 0x40cdeac) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1370 +0x76 runtime.parkunlock(0x40ddd00, 0x40cdeac) /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1386 +0x39 bgsweep() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mgc0.c:1910 +0xad runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 created by runtime.gc /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mgc0.c:2179 goroutine 17 [syscall]: runtime.goexit() /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1446 exit status 2 |
Go code doesn't need any stack space to check whether there is enough stack space, but it does need to know which stack it is running on. For your program in comment #35, the crash is occurring because the signal handler expects to be running on the signal stack, so it compares the stack pointer with the stack guard for that stack (m->gsignal->stackguard0, where m is a thread local variable). In some cases the signal is not delivered on the signal stack, and the overflow check fails. The crash then occurs when the stack split code sees (in effect) that it is running on a stack that is not expected to overflow. I'm not sure why my program in comment #36 fails on Darwin, it does run successfully on GNU/Linux. |
Stepping through #36 I can see that the value in (%rcx) compared to %rsp is 0xfffffffffffffade = -1314, which would be the StackPreempt value and backtrace according to lldb is: * frame #0: 0x0000000004016cb0 7227`runtime.sighandler frame #1: 0x00007fff87db85aa libsystem_platform.dylib`_sigtramp + 26 frame #2: 0x000000000402b6af 7227`syscall.Syscall6 + 47 I'm presuming this means that the signal handler is then incorrectly running on the same stack as the syscall as the syscall's stack guard was set to the Preempt value via runtime.entersyscall. Curiously I found an even simpler repro sample: http://play.golang.org/p/__3tlxhdB9 The println prints all structs as having the same value, with sa_flags=67 and on my system SA_ONSTACK is 0x1 so I don't know what to make of this. |
Oh, I see, it's because Go's making use of the syscall directly and expecting its own trampoline entrypoint. The struct provided to the libc sigaction function does not have the sa_tramp field, likely resulting in the trampoline resetting to whatever the libc implementation uses. pkg/runtime/os_linux.c does this in runtime·setsig: if(fn == runtime·sighandler) fn = (void*)runtime·sigtramp; And thus Linux doesn't have an have an issue with chained signal handlers as the trampoline will be what's set as the function pointer when the user queries the current function via sigaction. |
When Go is the host environment there's the monkey patch work around (#22) that's been proven to work for at least Qt and Python. When the host environment is not Go, and Go is loaded into the process to extend the functionality of these hosts that might or might not require its own set of signal handlers it gets trickier. While traditionally such extensions are compiled into a shared library, and those are the glasses I've been wearing, it seems to me that it'd be so much more robust to simply run the host and the plugin in separate processes communicating via some form of rpc shim. It certainly would make it much clearer on exactly who owns a SIGSEGV, SIGBUS or SIGFPE, and the program(s) would be able to make better decisions on how to handle them. RPC works for me so I'm fine with whatever approach is taken here (including none). For reference I wrote a little python script for lldb (https://gist.github.com/quarnster/9992193) to dump info when sigaction is called to see what Python, Java and Ruby does for a simple "hello world" program. Python does: Installed a new signal handler for signal: SIGPIPE: ignore Installed a new signal handler for signal: SIGXFSZ: ignore Installed a new signal handler for signal: SIGINT: (void (*)(int)) __sa_handler = 0x00000001000b6043 (Python`___lldb_unnamed_function1745$$Python) Java does: Installed a new signal handler for signal: SIGUSR2: (void (*)(int)) __sa_handler = 0x00000001033b2fca (libjvm.dylib`SR_handler(int, __siginfo*, __darwin_ucontext*)) Installed a new signal handler for signal: SIGSEGV: (void (*)(int)) __sa_handler = 0x00000001033b3431 (libjvm.dylib`signalHandler(int, __siginfo*, void*)) Installed a new signal handler for signal: SIGPIPE: (void (*)(int)) __sa_handler = 0x00000001033b3431 (libjvm.dylib`signalHandler(int, __siginfo*, void*)) Installed a new signal handler for signal: SIGBUS: (void (*)(int)) __sa_handler = 0x00000001033b3431 (libjvm.dylib`signalHandler(int, __siginfo*, void*)) Installed a new signal handler for signal: SIGILL: (void (*)(int)) __sa_handler = 0x00000001033b3431 (libjvm.dylib`signalHandler(int, __siginfo*, void*)) Installed a new signal handler for signal: SIGFPE: (void (*)(int)) __sa_handler = 0x00000001033b3431 (libjvm.dylib`signalHandler(int, __siginfo*, void*)) Installed a new signal handler for signal: SIGXFSZ: (void (*)(int)) __sa_handler = 0x00000001033b3431 (libjvm.dylib`signalHandler(int, __siginfo*, void*)) Installed a new signal handler for signal: SIGHUP: (void (*)(int)) __sa_handler = 0x00000001033b122b (libjvm.dylib`UserHandler(int, void*, void*)) Installed a new signal handler for signal: SIGINT: (void (*)(int)) __sa_handler = 0x00000001033b122b (libjvm.dylib`UserHandler(int, void*, void*)) Installed a new signal handler for signal: SIGTERM: (void (*)(int)) __sa_handler = 0x00000001033b122b (libjvm.dylib`UserHandler(int, void*, void*)) Installed a new signal handler for signal: SIGQUIT: (void (*)(int)) __sa_handler = 0x00000001033b122b (libjvm.dylib`UserHandler(int, void*, void*)) Ruby does: Installed a new signal handler for signal: SIGINT: (void (*)(int)) __sa_handler = 0x00000001000c30fb (libruby.2.0.0.dylib`___lldb_unnamed_function2046$$libruby.2.0.0.dylib) Installed a new signal handler for signal: SIGHUP: (void (*)(int)) __sa_handler = 0x00000001000c30fb (libruby.2.0.0.dylib`___lldb_unnamed_function2046$$libruby.2.0.0.dylib) Installed a new signal handler for signal: SIGQUIT: (void (*)(int)) __sa_handler = 0x00000001000c30fb (libruby.2.0.0.dylib`___lldb_unnamed_function2046$$libruby.2.0.0.dylib) Installed a new signal handler for signal: SIGTERM: (void (*)(int)) __sa_handler = 0x00000001000c30fb (libruby.2.0.0.dylib`___lldb_unnamed_function2046$$libruby.2.0.0.dylib) Installed a new signal handler for signal: SIGALRM: (void (*)(int)) __sa_handler = 0x00000001000c30fb (libruby.2.0.0.dylib`___lldb_unnamed_function2046$$libruby.2.0.0.dylib) Installed a new signal handler for signal: SIGUSR1: (void (*)(int)) __sa_handler = 0x00000001000c30fb (libruby.2.0.0.dylib`___lldb_unnamed_function2046$$libruby.2.0.0.dylib) Installed a new signal handler for signal: SIGUSR2: (void (*)(int)) __sa_handler = 0x00000001000c30fb (libruby.2.0.0.dylib`___lldb_unnamed_function2046$$libruby.2.0.0.dylib) Installed a new signal handler for signal: SIGBUS: (void (*)(int)) __sa_handler = 0x00000001000c3b9b (libruby.2.0.0.dylib`___lldb_unnamed_function2054$$libruby.2.0.0.dylib) Installed a new signal handler for signal: SIGSEGV: (void (*)(int)) __sa_handler = 0x00000001000c3bb7 (libruby.2.0.0.dylib`___lldb_unnamed_function2055$$libruby.2.0.0.dylib) Installed a new signal handler for signal: SIGPIPE: ignore Installed a new signal handler for signal: SIGINT: ignore |
CL https://golang.org/cl/18102 mentions this issue. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
FrozenDueToAge
OS-Darwin
Suggested
Issues that may be good for new contributors looking for work to do.
The text was updated successfully, but these errors were encountered: