-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime/cgo: crash when invoking Go callback from C library destructor if running under race detector #59711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Exit hooks are hard to implement in Go. See the "CAREFUL" comment on Moreover, exit hooks are historically a source of complexity and problems in other languages. See the C++ movement over the years of So, no, we aren't going to add exit hooks. The exact failure that you are encountering is not clear to me. It looks like the call is happening on the wrong stack, but I don't see why that would be. is there a way that we can recreate the problem? It may indeed be impossible to fix, but let's try to pin down what is happening. |
Yes, see this part of the issue: |
Thanks. Sorry for missing that. Because the race detector is enabled, The global destructor then turns around and calls back into Go. But the Go runtime is not expecting that. Normally a callback into Go occurs either on a thread created by Go that has called into C which is then calling back into Go, or a thread not created by Go, which is calling a Go function. In this case, though, we have a thread created by Go that has not called into C, at least not in the usual way, so the Go callback runs into a thread not prepared to receive it. Hence the crash, which is basically a sanity check. Working on a patch. |
Change https://go.dev/cl/486615 mentions this issue: |
Thanks for the patch and the explanations! A little question: why destructors are invoked under race detector, but seems they're not without it? At least in my experiments. BTW I think you can simplify the test in the patch a bit by removing destructorFn and registerDestructor and just hardcoding GoDestructorCallback call (I added them because I used shared library). Also, probably it would make sense to print OK from GoDestructorCallback? So that the test would pass only if the destructor is really called. |
With the current toolchain, if you call On this general topic see #20713.
The single program testprogcgo includes a bunch of different tests. Which one is run depends on the command line argument. I copied your registration because I didn't want the destructor to always do a Go callback, as that might confuse an unrelated test.
As noted above the destructor is not always called. |
@ianlancetaylor I see, thanks! |
Change https://go.dev/cl/487635 mentions this issue: |
Change https://go.dev/cl/487955 mentions this issue: |
This is the AIX and Solaris equivalent of CL 269378. On AIX and Solaris, where we use libc for syscalls, when the runtime exits, it calls the libc exit function, which may call back into user code, such as invoking functions registered with atexit. In particular, it may call back into Go. But at this point, the Go runtime is already exiting, so this wouldn't work. On non-libc platforms we use exit syscall directly, which doesn't invoke any callbacks. Use _exit on AIX and Solaris to achieve the same behavior. Test is TestDestructorCallback. For #59711 Change-Id: I666f75538d3e3d8cf3b697b4c32f3ecde8332890 Reviewed-on: https://go-review.googlesource.com/c/go/+/487635 Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]>
Also preserve the PC/SP in reentersyscall when doing lock ranking. The test is TestDestructorCallbackRace with the staticlockranking experiment enabled. For #59711 Change-Id: I87ac1d121ec0d399de369666834891ab9e7d11b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/487955 Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Hi,
I was debugging a crash in Go bindings for a C++ library (with pure C public interface). The crash was reproducing in a specific use-case:
The crash was reproducing only when the tests were running with race detector enabled. Without race detector, it seems that global C++ destructors are not running at all.
I've prepared a minimal example that can reproduce it: https://github.com/gavv/cgo_racedetector_crash
In TestExample, we call example() and exit. example() just registers Go callback in libfoo.so. libfoo has a destructor function, which invokes the registered callback. When running under race detector, you can see a crash:
Here is full backtrace: https://gist.github.com/gavv/586a727c54b85a151994f061ea9fbf7f
Another backtrace: https://gist.github.com/gavv/8e0e5fc20ee34b10a19223bcde64036c
The crash happens when cgocallback invokes cgocallbackg, and it invokes exitsyscall; exitsyscall meets something bad and crashes.
What did you expect to see?
What are possible solutions for this:
Tell the user of the bindings that they must explicitly deinitialize everything before exiting program, otherwise crashes are possible. For now that's the only real option for us. I don't like it because it's really easy to forget closing some long-living objects (when you typically don't rely on defer), and if this will cause segfaults in user programs, it would be really annoying.
Modify the C++ library to avoid invocation of user-registered callbacks in global destructors. That's what I plan to do, however it's not a complete fix for the problem. C++ lib also runs various background threads, and those threads may attempt to invoke registered callbacks too, and they don't know that Go runtime is already being deinitialized.
If Go had public API for exit hooks, we could use it to properly deinitialize native objects or at least notify the C++ library to stop using registered callbacks. For me this seem like the most clean solution. Are there any plans to make exit hooks API public?
If Go was invoking atexit handlers before deinitializing runtime (or whatever bad happens here), we could use them from the C++ side to set a flag that we should stop using callbacks. It would be partial solution (we still can't clearly deinitialize everything), but it would fix the crash.
runtime/cgo: call C exit to allow global destructors/atexit to run #20713
cmd/cgo: atexit handlers not run #5948
Maybe it's possible to fix or work around this on cgo side: either fix crash if it's not really legitimate or maybe make callbacks no-op if they're called too late (arguable).
I guess this issue can be classified both as a bug report, or a feature request, or a wontfix, depending on the point of view.
From bindings maintainer point of view, I would put it like this:
Thanks.
The text was updated successfully, but these errors were encountered: