Skip to content

runtime/trace: "preempted" StateTransition sometimes has Stack of single zeroed StackFrame #68090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rhysh opened this issue Jun 20, 2024 · 11 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Jun 20, 2024

Go version

go version devel go1.23-477ad7dd51 Thu Jun 20 16:46:54 2024 +0000 darwin/arm64

Output of go env in your module/workspace:

$ go env -changed

$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/rhysh/Library/Caches/go-build'
GOENV='/Users/rhysh/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/rhysh/work/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/rhysh/work'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='devel go1.23-477ad7dd51 Thu Jun 20 16:46:54 2024 +0000'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/rhysh/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/pw/d_qmtcrd3vs0890gvmrq8qx80000gn/T/go-build2075620848=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

$ go test net/http -run='^$' -bench='BenchmarkClientServerParallel/64/h1' -benchtime=100ms -trace=/tmp/trace
goos: darwin
goarch: arm64
pkg: net/http
cpu: Apple M1
BenchmarkClientServerParallel/64/h1-8               1748             64048 ns/op           22199 B/op        131 allocs/op
--- BENCH: BenchmarkClientServerParallel/64/h1-8
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59008->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59007->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59010->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59013->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59014->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59015->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59016->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59017->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59018->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59020->127.0.0.1:58995: read: connection reset by peer
        ... [output truncated]
PASS
ok      net/http        0.404s
$ go tool trace -d=1 /tmp/trace | grep -B 2 -A 2 '@ 0x0' | head -n 30
M=6163918848 P=0 G=25 StateTransition Time=775708360289024 Resource=Goroutine(25) Reason="preempted" GoID=25 Running->Runnable
TransitionStack=
         @ 0x0
                :0

Stack=
         @ 0x0
                :0

--
M=6163918848 P=0 G=26 StateTransition Time=775708372081984 Resource=Goroutine(26) Reason="preempted" GoID=26 Running->Runnable
TransitionStack=
         @ 0x0
                :0

Stack=
         @ 0x0
                :0

--
M=6162198528 P=7 G=2881 StateTransition Time=775708382371201 Resource=Goroutine(2881) Reason="preempted" GoID=2881 Running->Runnable
TransitionStack=
         @ 0x0
                :0

Stack=
         @ 0x0
                :0

--

What did you see happen?

Some StateTransition Events include a Stack and StateTransition.Stack that are not equal to NoStack, but which also don't contain a stack from the Event's goroutine. Instead, they yield a single zeroed StackFrame (PC of 0x0, Line of 0, File and Func of "").

I've only seen this on Running->Runnable transitions, with Reason="preempted".

It's also present in go1.22.4.

Here's the sort of stack I'd expect to see from that execution trace's view of goroutines 25, 26, and 2881:

$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=25 StateTransition/,/^M/ p' | head -n 50
[snip]
M=6162198528 P=4 G=25 StateTransition Time=775708359254528 Resource=Goroutine(25) Reason="system goroutine wait" GoID=25 Running->Waiting
TransitionStack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

Stack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

[snip]
$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=26 StateTransition/,/^M/ p' | head -n 50
[snip]
M=8191703744 P=2 G=26 StateTransition Time=775708359184320 Resource=Goroutine(26) Reason="system goroutine wait" GoID=26 Running->Waiting
TransitionStack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

Stack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

[snip]
$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=2881 StateTransition/,/^M/ p' | head -n 50
M=6163345408 P=4 G=2881 StateTransition Time=775708380563968 Resource=Goroutine(2881) Reason="sync" GoID=2881 Running->Waiting
TransitionStack=
        sync.(*Mutex).Lock @ 0x100bf30ff
                /usr/local/go/src/sync/mutex.go:92
        sync.(*Pool).pinSlow @ 0x100bf3094
                /usr/local/go/src/sync/pool.go:227
        sync.(*Pool).pin @ 0x100bf301b
                /usr/local/go/src/sync/pool.go:220
        sync.(*Pool).Get @ 0x100bf2d6f
                /usr/local/go/src/sync/pool.go:135
        fmt.newPrinter @ 0x100c4dff3
                /usr/local/go/src/fmt/print.go:152
        fmt.Fprintf @ 0x100c4e467
                /usr/local/go/src/fmt/print.go:223
        net/http.(*Request).write @ 0x100e4aacb
                /usr/local/go/src/net/http/request.go:680
        net/http.(*persistConn).writeLoop @ 0x100e73df7
                /usr/local/go/src/net/http/transport.go:2522

Stack=
        sync.(*Mutex).Lock @ 0x100bf30ff
                /usr/local/go/src/sync/mutex.go:92
        sync.(*Pool).pinSlow @ 0x100bf3094
                /usr/local/go/src/sync/pool.go:227
        sync.(*Pool).pin @ 0x100bf301b
                /usr/local/go/src/sync/pool.go:220
        sync.(*Pool).Get @ 0x100bf2d6f
                /usr/local/go/src/sync/pool.go:135
        fmt.newPrinter @ 0x100c4dff3
                /usr/local/go/src/fmt/print.go:152
        fmt.Fprintf @ 0x100c4e467
                /usr/local/go/src/fmt/print.go:223
        net/http.(*Request).write @ 0x100e4aacb
                /usr/local/go/src/net/http/request.go:680
        net/http.(*persistConn).writeLoop @ 0x100e73df7
                /usr/local/go/src/net/http/transport.go:2522

[snip]

What did you expect to see?

I expected the stack to be trace.NoStack when no stack was available, or for the stack to contain PC/Func/File/Line corresponding to code that the goroutine had on its stack. I should not see PC of 0x0.

CC @mknyszek @golang/runtime

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 20, 2024
@mknyszek
Copy link
Contributor

I have a suspicion as to how this is happening, but not a complete picture yet.

The problematic transition you point out also happens from a thread stack. The two cases where such a transition may appear are gopreempt_m and goyield_m. The former is only called from newstack while the latter is only called from goyield which is on the sema path (for a direct handoff).

I suspect that in one of these paths, gp.sched is still empty somehow, though I'm not sure how that's possible. That's at least a case where traceStack may produce a length-1 buffer with a single zero PC in it.

@mknyszek
Copy link
Contributor

#68093 may be related, but I suspect not.

@mknyszek
Copy link
Contributor

mknyszek commented Jun 20, 2024

Here's a simple reproducer:

package main

import (
	"log"
	"os"
	"runtime"
	"runtime/trace"
)

func main() {
	f, err := os.Create("trace.out")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	trace.Start(f)

	go func() {
		for {
			// Non-stop preemption points.
			g()
		}
	}()

	runtime.GC()
	trace.Stop()
}

//go:noinline
func g() {

}

It disproves my theory about gp.sched being empty. Also, funnily enough, the goroutine created in main always shows up fine -- it's the GC mark worker that's the problem.

@mknyszek
Copy link
Contributor

mknyszek commented Jun 20, 2024

OK, I figured it out. It's that the stack trace has exactly 1 frame in it, but the skip count is also 1.

In the reproducer, the victim goroutine is the GC mark worker (just like in the original post) and when I lower the skip count from 1 to 0, I see:

M=683710 P=2 G=68 StateTransition Time=253871909721536 Resource=Goroutine(68) Reason="preempted" GoID=68 Running->Runnable
TransitionStack=
        runtime.gcMarkDone @ 0x416865
                /usr/local/google/home/mknyszek/work/go-1/src/runtime/mgc.go:824

Stack=
        runtime.gcMarkDone @ 0x416865
                /usr/local/google/home/mknyszek/work/go-1/src/runtime/mgc.go:824

Unfortunately, there's still the question as to why the bottom frame in the mark worker isn't showing up.

@joedian joedian added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 25, 2024
@griesemer griesemer added this to the Go1.24 milestone Jun 26, 2024
@gopherbot gopherbot modified the milestones: Go1.24, Go1.25 Feb 11, 2025
@rhysh
Copy link
Contributor Author

rhysh commented May 8, 2025

The behavior of the reproducer you posted doesn't seem the same as the original failure I see in the net/http benchmark: in my testing, it doesn't include "0x0" frames.

In my debugging so far it looks like there are approximately two paths to getting a stack that consists of a single 0x0 frame. The behavior of makeTraceFrames is involved with at least one of them: it uses CallersFrames, including with an empty pcs slice, and uses the Frame (via Next) without confirming that it's valid.

Sometimes the problem is apparent as soon as traceStack, like when it returns nstk==1 with a pcBuf[0] (skip) value of 1. Below is a traceback from one of those (I've been sprinkling in throws to feel my way around). It looks like a goroutine that is just coming into existence, and doesn't quite have its own call stack yet.

@mknyszek , maybe something there is enough of a hint that you can immediately solve the puzzle. But I plan to keep hacking on this until the Go 1.25 freeze sets in (and maybe the fix is small enough to accept at this point in the release cycle).

runtime stack:
runtime.throw({0x1049f54d7?, 0x16ba668b8?})
	/usr/local/go/src/runtime/panic.go:1089 +0x34 fp=0x16ba66830 sp=0x16ba66800 pc=0x104637634
runtime.traceStack(0x14000080008?, 0x14000080008?, 0x1)
	/usr/local/go/src/runtime/tracestack.go:143 +0x360 fp=0x16ba66cd0 sp=0x16ba66830 pc=0x10462d9d0
runtime.traceLocker.stack(...)
	/usr/local/go/src/runtime/traceevent.go:59
runtime.traceLocker.GoStop({0x14000080008?, 0x1?}, 0x2)
	/usr/local/go/src/runtime/traceruntime.go:464 +0x68 fp=0x16ba66d50 sp=0x16ba66cd0 pc=0x10462cb58
runtime.traceLocker.GoPreempt(...)
	/usr/local/go/src/runtime/traceruntime.go:459
runtime.goschedImpl(0x1400010c1c0, 0x1?)
	/usr/local/go/src/runtime/proc.go:4197 +0x88 fp=0x16ba66da0 sp=0x16ba66d50 pc=0x10460b6c8
runtime.gopreempt_m(...)
	/usr/local/go/src/runtime/proc.go:4233
runtime.newstack()
	/usr/local/go/src/runtime/stack.go:1075 +0x2bc fp=0x16ba66ed0 sp=0x16ba66da0 pc=0x10461ca7c
runtime.morestack()
	/usr/local/go/src/runtime/asm_arm64.s:392 +0x70 fp=0x16ba66ed0 sp=0x16ba66ed0 pc=0x10463da10

goroutine 7209 gp=0x1400010c1c0 m=4 mp=0x14000080008 [running]:
runtime.goexit1()
	/usr/local/go/src/runtime/proc.go:4333 +0xdc fp=0x140004c17d0 sp=0x140004c17d0 pc=0x10460bf7c
runtime.goexit({})
	/usr/local/go/src/runtime/asm_arm64.s:1269 +0x8 fp=0x140004c17d0 sp=0x140004c17d0 pc=0x10463fc78
created by net.(*netFD).connect in goroutine 7208
	/usr/local/go/src/net/fd_unix.go:106 +0x274

@mknyszek
Copy link
Contributor

mknyszek commented May 9, 2025

But I plan to keep hacking on this until the Go 1.25 freeze sets in (and maybe the fix is small enough to accept at this point in the release cycle).

I'll try to take a closer look today, but before that I'll just say that bug fixes are generally fair game for the freeze, even if they're for old bugs (we just prioritize new bugs).

@rhysh
Copy link
Contributor Author

rhysh commented May 9, 2025

Thanks. I don't have an estimate of how invasive the fix may be, so I'll keep aiming for pre/early freeze.

I see that goexit1 is responsible for cleaning up goroutines as they exit .. it looks like the (self-inflicted) crash I described above is a case where: 1/ there was a goroutine, 2/ it had functions on its call stack that correspond to that goroutine's specific work (starting with net.(*netFD).connect.func2), 3/ it got a preemption request, and 4/ the next cooperative scheduling point it reached was the stack growth check at the start of runtime.goexit1.

So the execution trace is trying to describe a real preemption event, about a real goroutine, that really doesn't have any (user-level) functions on its call stack.

That's an unusual preemption event! Given that it exists, I'm not sure how to describe it in ways that won't be a surprise to consumers of execution trace data. Maybe we could make it not happen at all: There's very little in goexit1 (race detector, execution tracer, mcall(goexit0)). Maybe we could mark it nosplit and have it hold the M during its function calls. That sounds fragile.

This doesn't explain the @ 0x0 stacks for gcBgMarkWorker and net/http.(*persistConn).writeLoop goroutines (ones that often idle with a single "real" function on their call stack). If there's a common cause, I'd like to find it before formally proposing any partial fixes.

@rhysh
Copy link
Contributor Author

rhysh commented May 9, 2025

I'm debugging this mainly on darwin/arm64, so the following details are about that platform. (I've re-confirmed that I also see occasional Reason="preempted" events with @ 0x0 frames on linux/amd64.)

There's some disagreement between runtime/tracestack.go's fpTracebackPCs (used for the execution tracer) and the general-purpose traceback code.

Preemption events are generated while running on the system stack, which requires special backtracing behavior.

It looks like fpTracebackPCs skips the second frame from the top of the user goroutine stack: It identifies the top frame via gp.sched.pc. It then tries to identify the next frame by looking a word above where gp.sched.bp, but finds the third frame rather than the second.

I've included an example below, via a crash I added in traceStack.
pcBuf[0] (skip count) is 1
pcBuf[1] (gp.sched.pc) is 0x1042c6fb0 which resolves to runtime.selectgo /usr/local/go/src/runtime/select.go:122
pcBuf[2] (the first via gp.sched.bp) is 0x10453c498 which resolves to net/http.(*Transport).dialConn.gowrap3 /usr/local/go/src/net/http/transport.go:1945
pcBuf[3] (walking the fp/bp list) is 0x1042efd64 which resolves to runtime.goexit /usr/local/go/src/runtime/asm_arm64.s:1268
pcBuf[4] (walking the fp/bp list) is 0x0

There should be a frame for net/http.(*persistConn).writeLoop in between pcBuf[1] and pcBuf[2], but fpTracebackPCs does not find it. The call stack from the crash looks correct to me, and includes net/http.(*persistConn).writeLoop:

runtime stack:
runtime.throw({0x1046a65f0?, 0x1042dcd40?})
	/usr/local/go/src/runtime/panic.go:1089 +0x34 fp=0x16c2a2820 sp=0x16c2a27f0 pc=0x1042e7724
runtime.traceStack(0x14000081008?, 0x16c2a2ce8?, 0x1)
	/usr/local/go/src/runtime/tracestack.go:144 +0x544 fp=0x16c2a2cd0 sp=0x16c2a2820 pc=0x1042ddbb4
runtime.traceLocker.stack(...)
	/usr/local/go/src/runtime/traceevent.go:59
runtime.traceLocker.GoStop({0x14000081008?, 0x1042d1a04?}, 0x2)
	/usr/local/go/src/runtime/traceruntime.go:464 +0x68 fp=0x16c2a2d50 sp=0x16c2a2cd0 pc=0x1042dcb58
runtime.traceLocker.GoPreempt(...)
	/usr/local/go/src/runtime/traceruntime.go:459
runtime.goschedImpl(0x14001fae700, 0x1?)
	/usr/local/go/src/runtime/proc.go:4197 +0x88 fp=0x16c2a2da0 sp=0x16c2a2d50 pc=0x1042bb6c8
runtime.gopreempt_m(...)
	/usr/local/go/src/runtime/proc.go:4233
runtime.newstack()
	/usr/local/go/src/runtime/stack.go:1075 +0x2bc fp=0x16c2a2ed0 sp=0x16c2a2da0 pc=0x1042cca7c
runtime.morestack()
	/usr/local/go/src/runtime/asm_arm64.s:392 +0x70 fp=0x16c2a2ed0 sp=0x16c2a2ed0 pc=0x1042edb00

goroutine 11028 gp=0x14001fae700 m=13 mp=0x14000081008 [running]:
runtime.selectgo(0x140016a1f38?, 0x140016a1ee0?, 0x0?, 0x0?, 0x2?, 0x1?)
	/usr/local/go/src/runtime/select.go:122 +0x10f0 fp=0x140016a1ea0 sp=0x140016a1ea0 pc=0x1042c6fb0
net/http.(*persistConn).writeLoop(0x14001795560)
	/usr/local/go/src/net/http/transport.go:2597 +0x94 fp=0x140016a1fb0 sp=0x140016a1ea0 pc=0x10453f234
net/http.(*Transport).dialConn.gowrap3()
	/usr/local/go/src/net/http/transport.go:1945 +0x28 fp=0x140016a1fd0 sp=0x140016a1fb0 pc=0x10453c498
runtime.goexit({})
	/usr/local/go/src/runtime/asm_arm64.s:1268 +0x4 fp=0x140016a1fd0 sp=0x140016a1fd0 pc=0x1042efd64
created by net/http.(*Transport).dialConn in goroutine 9558
	/usr/local/go/src/net/http/transport.go:1945 +0x1164

@mknyszek
Copy link
Contributor

mknyszek commented May 9, 2025

So the execution trace is trying to describe a real preemption event, about a real goroutine, that really doesn't have any (user-level) functions on its call stack.

Agreed, this does seem like it's trying to preempt an exiting goroutine. I was gonna respond that newly-created goroutines always start with a stack, so they shouldn't morestack when they start executing (except, I guess, if the first call frame is very large -- maybe we deal with that already though).

Given that it exists, I'm not sure how to describe it in ways that won't be a surprise to consumers of execution trace data.

That's a good question. I wonder if we should just show goexit1? I know we special-case goexit, but it's weird that goexit1 doesn't show up. Could it be a result of the frame-skip bug you found in your latest reply?

There should be a frame for net/http.(*persistConn).writeLoop in between pcBuf[1] and pcBuf[2], but fpTracebackPCs does not find it.

Nice find on the disagreement between the two! Looks like yes, we're starting traceback from the wrong point. I'll have to take a closer look to try to understand what the 'right' point is.

I suspect that this is probably the reason why there's a frame missing from the mark background worker, after I fix the skip count, in my example, above. (I do think the skip count is probably at least one reason things are wrong. I spot-checked the counts when changing out the tracer, but I wasn't super thorough.)

@rhysh
Copy link
Contributor Author

rhysh commented May 9, 2025

Running with GODEBUG=tracefpunwindoff=1 is a partial fix. We could call that "bug 1".

Adding //go:nosplit and an acquirem/releasem pair to goexit1 is an additional partial fix. We could call that "bug 2a".

Following those, I see that goroutines can experience preemption before their gowrapN function finishes running. It's not about needing the stack to grow, it's that the non-user-visible functions that run at the very start (gowrap) and end (goexit1) of a goroutine's life don't completely execute in a nosplit context and so may observe preemption.

Given that it exists, I'm not sure how to describe it in ways that won't be a surprise to consumers of execution trace data.

That's a good question. I wonder if we should just show goexit1? I know we special-case goexit, but it's weird that goexit1 doesn't show up. Could it be a result of the frame-skip bug you found in your latest reply?

The bottom of the stack is usually goexit, below a call to the function that the code provided to the go keyword. When that user-visible function returns / exits, goexit makes a call to goexit1 to do the cleanup. The goexit1 function is only on the call stack for a moment, at the end of the goroutine's life. So no, it's not a frame-skip issue: the frame we'd like to see (because the user knows it as "what kind of goroutine is this") is no longer there.

The generated gowrap functions do some setup at the start of a new goroutine's life. The function that the user provided to the go keyword isn't running yet, so we can't show that frame. And the gowrap function is a wrapper (name after the function that contains the go keyword), so we don't show it. And it's not a nosplit context, so it can observe preemption requests (even if the runtime provided the goroutine with a large-enough initial stack). This is "bug 2b".

Maybe we say that a goroutine really does have no calls on its call stack at the start and end of its life, and report that as a zero-length call stack rather than as a call stack with a single pc=0x0 frame.

newly-created goroutines always start with a stack, so they shouldn't morestack when they start executing

Right, but preemption requests change the stack guard, to trigger a morestack call. I'm not sure whether we'd be able to mark all of the gowrap functions as nosplit / non-preemptible.

Maybe there are also bugs with the skip argument in some places, I'm not sure. "Bug 3", which might not exist.

Here's a preemption event at the start of a goroutine's life, seen in a self-inflicted crash:

runtime stack:
runtime.throw({0x10288d7f7?, 0x0?})
	/usr/local/go/src/runtime/panic.go:1089 +0x34 fp=0x16db427f0 sp=0x16db427c0 pc=0x1024cf964
runtime.traceStack(0x1400005f008?, 0x1024b9a44?, 0x1)
	/usr/local/go/src/runtime/tracestack.go:171 +0x398 fp=0x16db42cd0 sp=0x16db427f0 pc=0x1024c5a48
runtime.traceLocker.stack(...)
	/usr/local/go/src/runtime/traceevent.go:59
runtime.traceLocker.GoStop({0x1400005f008?, 0x1024a0a80?}, 0x2)
	/usr/local/go/src/runtime/traceruntime.go:464 +0x68 fp=0x16db42d50 sp=0x16db42cd0 pc=0x1024c4b98
runtime.traceLocker.GoPreempt(...)
	/usr/local/go/src/runtime/traceruntime.go:459
runtime.goschedImpl(0x14000e68e00, 0x1?)
	/usr/local/go/src/runtime/proc.go:4197 +0x88 fp=0x16db42da0 sp=0x16db42d50 pc=0x1024a36c8
runtime.gopreempt_m(...)
	/usr/local/go/src/runtime/proc.go:4233
runtime.newstack()
	/usr/local/go/src/runtime/stack.go:1075 +0x2bc fp=0x16db42ed0 sp=0x16db42da0 pc=0x1024b4abc
runtime.morestack()
	/usr/local/go/src/runtime/asm_arm64.s:392 +0x70 fp=0x16db42ed0 sp=0x16db42ed0 pc=0x1024d5d40

goroutine 9166 gp=0x14000e68e00 m=3 mp=0x1400005f008 [running]:
net/http.(*Transport).dialConn.gowrap2()
	/usr/local/go/src/net/http/transport.go:1944 +0x3c fp=0x1400013a7d0 sp=0x1400013a7d0 pc=0x10272474c
runtime.goexit({})
	/usr/local/go/src/runtime/asm_arm64.s:1268 +0x4 fp=0x1400013a7d0 sp=0x1400013a7d0 pc=0x1024d7fa4
created by net/http.(*Transport).dialConn in goroutine 8562
	/usr/local/go/src/net/http/transport.go:1944 +0x111c

And from go tool objdump,

TEXT net/http.(*Transport).dialConn.gowrap2(SB) /usr/local/go/src/net/http/transport.go
  transport.go:1944     0x1002c8710             f9400b90                MOVD 16(R28), R16                               
  transport.go:1944     0x1002c8714             eb3063ff                CMP R16, RSP                                    
  transport.go:1944     0x1002c8718             54000169                BLS 11(PC)                                      
  transport.go:1944     0x1002c871c             f81e0ffe                MOVD.W R30, -32(RSP)                            
  transport.go:1944     0x1002c8720             f81f83fd                MOVD R29, -8(RSP)                               
  transport.go:1944     0x1002c8724             d10023fd                SUB $8, RSP, R29                                
  transport.go:1944     0x1002c8728             f9401390                MOVD 32(R28), R16                               
  transport.go:1944     0x1002c872c             b5000130                CBNZ R16, 9(PC)                                 
  transport.go:1944     0x1002c8730             f9400740                MOVD 8(R26), R0                                 
  transport.go:1944     0x1002c8734             9400052b                CALL net/http.(*persistConn).readLoop(SB)       
  transport.go:1944     0x1002c8738             f85f83fd                MOVD -8(RSP), R29                               
  transport.go:1944     0x1002c873c             f84207fe                MOVD.P 32(RSP), R30                             
  transport.go:1944     0x1002c8740             d65f03c0                RET                                             
  transport.go:1944     0x1002c8744             aa1e03e3                MOVD R30, R3                                    
  transport.go:1944     0x1002c8748             97f6c562                CALL runtime.morestack.abi0(SB)                 
  transport.go:1944     0x1002c874c             17fffff1                JMP net/http.(*Transport).dialConn.gowrap2(SB)  
  transport.go:1944     0x1002c8750             f9400211                MOVD (R16), R17                                 
  transport.go:1944     0x1002c8754             9100a3f4                ADD $40, RSP, R20                               
  transport.go:1944     0x1002c8758             eb11029f                CMP R17, R20                                    
  transport.go:1944     0x1002c875c             54fffea1                BNE -11(PC)                                     
  transport.go:1944     0x1002c8760             910023f4                ADD $8, RSP, R20                                
  transport.go:1944     0x1002c8764             f9000214                MOVD R20, (R16)                                 
  transport.go:1944     0x1002c8768             17fffff2                JMP -14(PC)                                     
  transport.go:1944     0x1002c876c             00000000                ?                                               

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

6 participants