-
Notifications
You must be signed in to change notification settings - Fork 18k
plugin: SIGSEGV with concurrent GC during plugin load #17455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Your preferable solution is possible, but a bit involved. The module data is put on the linked list by runtime.addmoduledata, called from an ELF .init_array. It's called with the C calling convention and doesn't have the Go machinery setup to call progToPointerMask. I think the easiest thing to do would be to have addmoduledata put the module on a pending linked list, and then have plugin_lastmoduleinit initialize gccdatamask/gccbssmask, then move the module to the main linked list. But a (somewhat depressing) question before that: if we are concurrently reading/writing moduledata.next, does that mean we should be atomically loading/storing it? If so, then I think while I'll keep the pending data structure used by addmouledata a linked list, the list of active modules should turn into a global slice, so all the sites were we iterate through modules only gets hit with a single atomic load. |
Oh and thanks for analyzing the crash. When I wrote plugin_lastmoduleinit I saw the gcdatamask/gcbssmask uses in mbitmap.go and assumed late initialization was fine because there were no active data pointers yet. But I completely missed the data root loading in gcMarkRootPrepare. |
Ah, no wonder I couldn't find it. Guru needs to learn assembly. :)
That seems reasonable. Do you even need the list? It looks like there can only be one pending module load right now (enforced by pluginsMu and assumed by lastmoduleinit.)
Well, we need to be doing something. I suspect what you proposed above would be fine on x86 as long as the compiler isn't doing any real memory reordering. On a weak machine or if the compiler gets more aggressive with reordering, there are three hazards:
These could be solved by always accessing the list with atomics, but that's pretty annoying. Hazard 1 could be solved by making just the store to Your global slice idea may be the way to go for 2 and 3 (and may address 1 by side effect), though note that you can't read or write a slice value atomically (since it's three words), so it can't actually be a slice, per se. You could "manually" store the pointer and length, though that would require two (carefully ordered) loads at each loop. Perhaps the best thing would be to store it as a pointer to an array with the convention that there's always a nil pointer after the last For completeness, a few other solutions come to mind, but I think I like the array the best:
|
Thanks for the thoughts. There are two uses of the moduledata linked list and addmoduledata. In -buildmode=plugin there only ever is one pending module, but in -buildmode=shared all modules are loaded by ld.so before program initialization, so in that case there are any number of pending modules. We could separate the two paths, but I would like to minimize architecture-dependent code in cmd/link, and the code to call addmoduledata is very arch-dependent. I've cut a CL that I believe introduces an array to take care of this, and run all.bash a few times on darwin. I don't have a better testing strategy for this, as I haven't been able to replicate the crash. |
Ah, okay.
I can reproduce it 100% of the time on linux/amd64. I'm happy to test. |
CL https://golang.org/cl/32357 mentions this issue. |
What version of Go are you using (
go version
)?Current master (86b2f29)
What operating system and processor architecture are you using (
go env
)?What did you do?
cd misc/cgo/testplugin GOGC=1 ./test.bash
What did you expect to see?
PASS
What did you see instead?
This is crashing in the GC's data segment scan. The problem is almost certainly that the plugin's moduledata has been linked into the module data list, but md.gcdatamask hasn't been initialized yet, so if a concurrent GC tries to scan the data segment of the module at just the wrong time, it will crash. I wasn't able to figure out where it gets linked into the list, but I can think of two potential solutions:
The first is preferable if possible, since attempting to stop the world will delay plugin loading during a concurrent GC cycle until the cycle ends.
The text was updated successfully, but these errors were encountered: