Skip to content

Commit f6bff1d

Browse files
committed
runtime: fix undead arguments in cgocall
From the garbage collector's perspective, time can move backwards in cgocall. However, in the midst of this time warp, the pointer arguments to cgocall can go from dead back to live. If a stack growth happens while they're dead and then a GC happens when they become live again, GC can crash with a bad heap pointer. Specifically, the sequence that leads to a panic is: 1. cgocall calls entersyscall, which saves the PC and SP of its call site in cgocall. Call this PC/SP "X". At "X" both pointer arguments are live. 2. cgocall calls asmcgocall. Call the PC/SP of this call "Y". At "Y" neither pointer argument is live. 3. asmcgocall calls the C code, which eventually calls back into the Go code. 4. cgocallbackg remembers the saved PC/SP "X" in some local variables, calls exitsyscall, and then calls cgocallbackg1. 5. The Go code causes a stack growth. This stack unwind sees PC/SP "Y" in the cgocall frame. Since the arguments are dead at "Y", they are not adjusted. 6. The Go code returns to cgocallbackg1, which calls reentersyscall with the recorded saved PC/SP "X", so "X" gets stashed back into gp.syscallpc/sp. 7. GC scans the stack. It sees there's a saved syscall PC/SP, so it starts the traceback at PC/SP "X". At "X" the arguments are considered live, so it scans them, but since they weren't adjusted, the pointers are bad, so it panics. This issue started as of commit ca4089a, when the compiler stopped marking arguments as live for the whole function. Since this is a variable liveness issue, fix it by adding KeepAlive calls that keep the arguments live across this whole time warp. The existing issue7978 test has all of the infrastructure for testing this except that it's currently up to chance whether a stack growth happens in the callback (it currently only happens on the linux-amd64-noopt builder, for example). Update this test to force a stack growth, which causes it to fail reliably without this fix. Fixes #17785. Change-Id: If706963819ee7814e6705693247bcb97a6f7adb8 Reviewed-on: https://go-review.googlesource.com/33710 Reviewed-by: Brad Fitzpatrick <[email protected]> Reviewed-by: Keith Randall <[email protected]>
1 parent 3f0f24d commit f6bff1d

File tree

2 files changed

+30
-1
lines changed

2 files changed

+30
-1
lines changed

misc/cgo/test/issue7978.go

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,9 +88,20 @@ func issue7978wait(store uint32, wait uint32) {
8888

8989
//export issue7978cb
9090
func issue7978cb() {
91+
// Force a stack growth from the callback to put extra
92+
// pressure on the runtime. See issue #17785.
93+
growStack(64)
9194
issue7978wait(3, 4)
9295
}
9396

97+
func growStack(n int) int {
98+
var buf [128]int
99+
if n == 0 {
100+
return 0
101+
}
102+
return buf[growStack(n-1)]
103+
}
104+
94105
func issue7978go() {
95106
C.issue7978c((*C.uint32_t)(&issue7978sync))
96107
issue7978wait(7, 8)

src/runtime/cgocall.go

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,13 +120,31 @@ func cgocall(fn, arg unsafe.Pointer) int32 {
120120
// foreign code.
121121
//
122122
// The call to asmcgocall is guaranteed not to
123-
// split the stack and does not allocate memory,
123+
// grow the stack and does not allocate memory,
124124
// so it is safe to call while "in a system call", outside
125125
// the $GOMAXPROCS accounting.
126+
//
127+
// fn may call back into Go code, in which case we'll exit the
128+
// "system call", run the Go code (which may grow the stack),
129+
// and then re-enter the "system call" reusing the PC and SP
130+
// saved by entersyscall here.
126131
entersyscall(0)
127132
errno := asmcgocall(fn, arg)
128133
exitsyscall(0)
129134

135+
// From the garbage collector's perspective, time can move
136+
// backwards in the sequence above. If there's a callback into
137+
// Go code, GC will see this function at the call to
138+
// asmcgocall. When the Go call later returns to C, the
139+
// syscall PC/SP is rolled back and the GC sees this function
140+
// back at the call to entersyscall. Normally, fn and arg
141+
// would be live at entersyscall and dead at asmcgocall, so if
142+
// time moved backwards, GC would see these arguments as dead
143+
// and then live. Prevent these undead arguments from crashing
144+
// GC by forcing them to stay live across this time warp.
145+
KeepAlive(fn)
146+
KeepAlive(arg)
147+
130148
endcgo(mp)
131149
return errno
132150
}

0 commit comments

Comments
 (0)