-
Notifications
You must be signed in to change notification settings - Fork 18k
mark - world not stopped while calling C functions #785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
I added printf("gp=%p, status=%d != g=%p, status=%d\n", gp, gp->status, g, g->status); before the throw in mgc0.c:mark and modified runtime.c to do: printf("\npanic g=%X PC=%X\n", g, (uint64)(uintptr)&unused); and proc.c to do: printf("\ngoroutine %d @ %p [%d]:\n", g->goid, g, g->status); There seems to be some kind of race condition, because I see: gp=0x2aea7ab40140, status=2 != g=0x2aea7ab40dc0, status=2 throw: mark - world not stopped but then by the time the stacktraces are printed, gp's status is Grunnable (1): panic g=0x2aea7ab40dc0 PC=0x2aea7ab44fa0 throw+0x3e /home/alberts/go/src/pkg/runtime/runtime.c:73 throw(0xffffffff, 0x46a306) mark+0x12d /home/alberts/go/src/pkg/runtime/mgc0.c:152 mark() gc+0x1ff /home/alberts/go/src/pkg/runtime/mgc0.c:319 gc(0x436410, 0x2aea00000020) mallocgc+0x1ca /home/alberts/go/src/pkg/runtime/malloc.c:95 mallocgc(0x2aea00000000, 0x0, 0x0, 0x0, 0x2aea7ab45090, ...) mal+0x36 /home/alberts/go/src/pkg/runtime/malloc.c:236 mal(0x20, 0x100000000) runtime.mal+0x1b /home/alberts/go/src/pkg/runtime/runtime1.c:7 runtime.mal(0x20, 0x280200000006, 0x41648f, 0x2aea00000020) <our go functions> goexit /home/alberts/go/src/pkg/runtime/proc.c:145 goexit() goroutine 4 @ 0x2aea7ab40140 [1]: gosched+0x4e /home/alberts/go/src/pkg/runtime/proc.c:542 gosched() mallocgc+0x322 /home/alberts/go/src/pkg/runtime/malloc.c:34 mallocgc(0x2aaacac4c580, 0x0, 0xaf021780763dbd00, 0x2000000000, 0x432c25, ...) mal+0x36 /home/alberts/go/src/pkg/runtime/malloc.c:236 mal(0x20, 0x100000000) runtime.mal+0x1b /home/alberts/go/src/pkg/runtime/runtime1.c:7 runtime.mal(0x20, 0x280200000006, 0x41648f, 0x2aaa00000020) <our go functions> goexit /home/alberts/go/src/pkg/runtime/proc.c:145 goexit() |
Owner changed to [email protected]. Status changed to Accepted. |
No, we aren't going any callbacks. The only special thing I can think of is that the time spent in the C function calls is very variable, depending on the data we are processing, which might introduce some interesting orderings in the completion of the Go routines. I've been trying to get something you can reproduce by replacing our C code with C code that does random sleeps, but no luck yet. |
While I was investigating this bug, I was wondering if some state in the scheduler was perhaps getting corrupted. I found this in runtime.h: every C file linked into a Go program must include runtime.h so that the C compiler knows to avoid other uses of these registers. the Go compilers know to avoid them. Is this going to be a problem when using Cgo to call already-compiled C code in third party libraries? |
Maybe this fix is also needed for amd64? http://code.google.com/p/go/source/detail?r=c5e89a52d8a8ace22a5f1d382d597f3dd8700dbe# |
That fix shouldn't be necessary, because on amd64 we keep the per-thread state in register (R14 and R15) instead of in memory. Restoring the registers on the way out of the signal handler will restore the state. However, if your program is getting signals while the C code is running, that would explain the bad behavior. The Go runtime assumes its own private data is in R14 and R15, but that's not true when calling into C. Is your code taking signals while running C code? |
I think this bug is related to this one: https://golang.org/issue/886 Because the fix I've introduced in issue #886 also changes behaviour of cgo calls. And now I look into "runtime/cgocall.c", it appears that cgo uses the same locking mechanics as runtime.LockOSThread(), therefore it is very likely that this is exactly the same bug. Short bug description: more than one running thread when GC is active, resulting in shared usage of thread local memory cache (aka mcache), which is not thread safe. A race condition. |
nsf, thanks for pointing out the connection to this bug. I think that the fix to issue #886 may well have fixed this bug too. Albert, can you try updating to tip and see whether you still see the problem in your programs? Thanks. Russ |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: