Skip to content

cmd/trace/v2: goroutine analysis page doesn't identify goroutines consistently #65574

Closed
@nsrip-dd

Description

@nsrip-dd

Go version

go version go1.22rc2 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/nick.ripley/Library/Caches/go-build'
GOENV='/Users/nick.ripley/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/nick.ripley/go/pkg/mod'
GONOPROXY='redacted
GONOSUMDB='redacted'
GOOS='darwin'
GOPATH='/Users/nick.ripley/go'
GOPRIVATE='redacted'
GOPROXY='redacted'
GOROOT='/Users/nick.ripley/sdk/go1.22rc2'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/nick.ripley/sdk/go1.22rc2/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22rc2'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/f3/g91d13pd6kd3vdxts_gsgd1r0000gn/T/go-build1407906451=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Collected an execution trace with the new v2 tracer and viewed it with go tool trace. I then navigated to the "Goroutine analysis" page.

What did you see happen?

I didn't see the goroutines I was looking for on the analysis page, like main.main. Many of the names in the "Start location" column look like where a goroutine would be waiting, not where it started. Here's an example of what I see, looking an an execution trace from running unit tests:

Start location Count Total execution time
testing.tRunner 2450 10.246053307s
runtime.gopark 9 6.733193086s
runtime.gcMarkDone 1 806.957881ms
runtime.goparkunlock 1 301.65869ms
runtime.(*scavengerState).park 1 153.088717ms
runtime.(*traceAdvancerState).start.func1 1 114.573568ms
time.Sleep 1 7.663809ms
runtime.chanrecv1 1 5.670979ms
runtime.traceStartReadCPU.func1 1 592.705µs
runtime/trace.Start.func1 1 255.957µs
(Inactive, no stack trace sampled) 2 0s

Here, for example, the goroutines called runtime.gopark are actually the runtime.gcBgMarkWorker goroutines, and runtime.chanrecv1 is actually main.main waiting on a channel.

What did you expect to see?

The "start location" of a goroutine should be the name of the entry function. For example, the goroutine created by the statement go myFunction() should be called myFunction.

The immediate issue is that the current goroutine summary code uses the wrong end of the call stack to identify a goroutine here. There's a secondary issue, which is that the UI uses a PC as a key for grouping together goroutines, but the tracer no longer records the "starting" PC for goroutines which existed at the start of the trace. So grouping by PC is no longer consistent. We could remove the dependency on the starting PC from the goroutine pages and just use the name for grouping. I can send a CL if we're okay with that.

As a followup, should we include starting PCs in traces again? The old tracer recorded the starting PC of every goroutine at the beginning of the trace. The starting PC would mainly be helpful for identifying goroutines where we don't see any other call stacks. We could also consider recording the full call stack of a goroutine the first time we see it, e.g. where it's blocked at the start of the trace, instead of/in addition to the starting PC. Happy to open a separate issue or leave a comment somewhere else if this point is worth discussing more?

cc @mknyszek

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions