Skip to content

runtime: main_init_done can race with cgocall, yielding partially initialized (user) packages #68479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
srosenberg opened this issue Jul 17, 2024 · 8 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@srosenberg
Copy link

Go version

go version go1.22.3 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/srosenberg/.cache/go-build'
GOENV='/home/srosenberg/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/srosenberg/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/srosenberg/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build567775636=/tmp/go-build -gno-record-gcc-switches'

What did you do?

A fuzz target (fuzzEngineKeys), written in Go, and integrated with OSS-Fuzz as per the instructions in [1], resulted in a null-dereference owing to the fact that the enclosing package had not finished initializing at the time of the C->Go call.

[1] https://google.github.io/oss-fuzz/getting-started/new-project-guide/go-lang/

What did you see happen?

Below is the stacktrace, easily reproducible from [1].

Thread 1 (Thread 0x7ffff7a33000 (LWP 82) "fuzzEngineKeysI"):
#0  0x0000555556fed000 in github.com/cockroachdb/cockroach/pkg/keys.timeseriesKeyPrint (buf=0x10c000a5e1e0, ~p1=..., key=...) at github.com/cockroachdb/cockroach/pkg/keys/printer.go:599
#1  0x0000555556fef188 in github.com/cockroachdb/cockroach/pkg/keys.safeFormatInternal.func1 (b=0x10c000a5e1b0, key=...) at github.com/cockroachdb/cockroach/pkg/keys/printer.go:738
#2  0x0000555556fee418 in github.com/cockroachdb/cockroach/pkg/keys.safeFormatInternal (valDirs=..., key=..., quoteRawKeys=true, ~r0=...) at github.com/cockroachdb/cockroach/pkg/keys/printer.go:781
#3  0x0000555556fed345 in github.com/cockroachdb/cockroach/pkg/keys.PrettyPrint (valDirs=..., key=..., ~r0=...) at github.com/cockroachdb/cockroach/pkg/keys/printer.go:631
#4  0x000055555683b685 in github.com/cockroachdb/cockroach/pkg/roachpb.Key.Format (k=..., f=..., verb=115) at github.com/cockroachdb/cockroach/pkg/roachpb/data.go:260
#5  0x0000555556952b4c in github.com/cockroachdb/cockroach/pkg/roachpb.(*Key).Format (k=<optimized out>, f=..., verb=<optimized out>) at <autogenerated>:1
#6  0x0000555555bed9f7 in fmt.(*pp).handleMethods (p=0x10c00075af70, verb=115, handled=<optimized out>, handled=<optimized out>) at fmt/print.go:640
#7  0x0000555555beec45 in fmt.(*pp).printArg (p=0x10c00075af70, arg=..., verb=115) at fmt/print.go:756
#8  0x0000555555bf4893 in fmt.(*pp).doPrintf (p=0x10c00075af70, format=..., a=...) at fmt/print.go:1075
#9  0x0000555555be8e71 in fmt.Fprintf (w=..., format=..., a=..., n=<optimized out>, err=...) at fmt/print.go:224
#10 0x000055555792334f in github.com/cockroachdb/cockroach/pkg/storage.EngineKey.Format (k=..., f=..., c=<optimized out>) at github.com/cockroachdb/cockroach/pkg/storage/engine_key.go:58
#11 github.com/cockroachdb/cockroach/pkg/storage.(*EngineKey).Format (k=<optimized out>, f=..., c=<optimized out>) at <autogenerated>:1
#12 0x0000555555bed9f7 in fmt.(*pp).handleMethods (p=0x10c00075a1a0, verb=118, handled=<optimized out>, handled=<optimized out>) at fmt/print.go:640
#13 0x0000555555beec45 in fmt.(*pp).printArg (p=0x10c00075a1a0, arg=..., verb=118) at fmt/print.go:756
#14 0x0000555555bf6bdc in fmt.(*pp).doPrintln (p=0x10c000a5e1e0, a=...) at fmt/print.go:1221
#15 0x0000555555be9588 in fmt.Fprintln (w=..., a=..., n=<optimized out>, err=...) at fmt/print.go:304
#16 0x0000555557622e68 in fmt.Println (a=..., n=<optimized out>, err=...) at fmt/print.go:314
#17 github.com/AdamKorcz/go-118-fuzz-build/testing.(*T).Logf (t=<optimized out>, format=..., args=...) at github.com/AdamKorcz/[email protected]/testing/t.go:79
#18 0x0000555557916bed in github.com/cockroachdb/cockroach/pkg/storage.FuzzEngineKeysInvariants.func4 (t=0x10c0008d2948, a=..., b=...) at github.com/cockroachdb/cockroach/pkg/storage/engine_key_fuzz.go_fuzz.go:113
#19 0x0000555555ab4885 in runtime.call64 () at runtime/asm_amd64.s:772
#20 0x0000555555ab9076 in runtime.reflectcall (stackArgsType=0x10c000a5e1e0, fn=0x0, stackArgs=0x0, stackArgsSize=0, stackRetOffset=4280, frameSize=0, regArgs=0x7) at <autogenerated>:1
#21 0x0000555555b836b1 in reflect.Value.call (v=..., op=..., in=..., ~r0=...) at reflect/value.go:596
#22 0x0000555555b8140e in reflect.Value.Call (v=..., in=..., ~r0=...) at reflect/value.go:380
#23 0x0000555557622310 in github.com/AdamKorcz/go-118-fuzz-build/testing.(*F).Fuzz (f=0x10c0000e2db0, ff=...) at github.com/AdamKorcz/[email protected]/testing/f.go:149
--Type <RET> for more, q to quit, c to continue without paging--
#24 0x0000555557915e7a in github.com/cockroachdb/cockroach/pkg/storage.FuzzEngineKeysInvariants (f=0x10c0000e2db0) at github.com/cockroachdb/cockroach/pkg/storage/engine_key_fuzz.go_fuzz.go:78
#25 0x0000555557925798 in main.LibFuzzerFuzzEngineKeysInvariants (data=..., ~r0=<optimized out>, ~r0=<optimized out>) at main.72723935.go:31
#26 0x000055555792564c in main.LLVMFuzzerTestOneInput (data=0x10c000a5e1e0 "", size=0, ~r0=<optimized out>, ~r0=<optimized out>) at main.72723935.go:24
#27 0x0000555557925d5f in _cgoexp_7b76f75e854c_LLVMFuzzerTestOneInput (a=0x7fffffffdc58) at _cgo_gotypes.go:52
#28 0x0000555555a4b965 in runtime.cgocallbackg1 (fn=0x555557925d20 <_cgoexp_7b76f75e854c_LLVMFuzzerTestOneInput>, frame=0x7fffffffdc58, ctxt=0) at runtime/cgocall.go:420
#29 0x0000555555a4b618 in runtime.cgocallbackg (fn=0x555557925d20 <_cgoexp_7b76f75e854c_LLVMFuzzerTestOneInput>, frame=0x7fffffffdc58, ctxt=0) at runtime/cgocall.go:339
#30 0x0000555555ab8beb in runtime.cgocallbackg (fn=0x555557925d20 <_cgoexp_7b76f75e854c_LLVMFuzzerTestOneInput>, frame=0x7fffffffdc58, ctxt=0) at <autogenerated>:1
#31 0x0000555555ab5fcd in runtime.cgocallback () at runtime/asm_amd64.s:1079
#32 0x0000555555ab6221 in runtime.goexit () at runtime/asm_amd64.s:1695
#33 0x0000000000000000 in ?? ()

[1] https://oss-fuzz.com/testcase-detail/5128848228810752

What did you expect to see?

We expect all packages to be fully initialized before any user function is invoked. It's an invariant according to the language spec. [1].

[1] https://go.dev/ref/spec#Package_initialization

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 17, 2024
@srosenberg
Copy link
Author

srosenberg commented Jul 17, 2024

Null-dereference is here [1] because its initialization [2] didn't execute. Based on my vague understanding of what's happening, it appears we have a race between a previous cgocall, which possibly happened here,

Thread 2 (Thread 0x7ffff57ff700 (LWP 85) "fuzzEngineKeysI"):
#0  runtime.futex () at runtime/sys_linux_amd64.s:558
#1  0x0000555555a7ced0 in runtime.futexsleep (addr=0xfffffffffffffe00, val=0, ns=93824997884771) at runtime/os_linux.go:69
#2  0x0000555555a53867 in runtime.notesleep (n=0x55555997cea0 <runtime.m0+320>) at runtime/lock_futex.go:170
#3  0x0000555555a87d8c in runtime.mPark () at runtime/proc.go:1761
#4  runtime.stopm () at runtime/proc.go:2782
#5  0x0000555555a8889e in runtime.startlockedm (gp=<optimized out>) at runtime/proc.go:3054
#6  0x0000555555a8a98a in runtime.schedule () at runtime/proc.go:3914
#7  0x0000555555a8b878 in runtime.goexit0 (gp=<optimized out>) at runtime/proc.go:4181
#8  0x0000555555ab41d0 in runtime.mcall () at runtime/asm_amd64.s:458
#9  0x00007ffff57fed88 in ?? ()
#10 0x0000555555ab8f25 in runtime.newproc (fn=0x555555ab40d3 <runtime.rt0_go+307>) at <autogenerated>:1
#11 0x0000555555ab4145 in runtime.mstart () at runtime/asm_amd64.s:394
#12 0x0000555555ab40d3 in runtime.rt0_go () at runtime/asm_amd64.s:358
#13 0x0000000000000002 in ?? ()
#14 0x00007fffffffe178 in ?? ()
#15 0x0000000000000000 in ?? ()

and the above cgocallback. Effectively, waiting on main_init_done is bypassed owing to gp.m.ncgo > 0, thereby exposing a race between doInit(m.inittasks) and cb(frame) (in cgocallbackg1).

NOTE: the fuzz target is generated using -buildmode=c-archive, hence rt0_go is our entry point.

[1] https://github.com/cockroachdb/cockroach/blob/bcd4e5e37f2a95e73347f5ed83db5a24bebfd8d1/pkg/keys/printer.go#L599
[2] https://github.com/cockroachdb/cockroach/blob/bcd4e5e37f2a95e73347f5ed83db5a24bebfd8d1/pkg/ts/keys.go#L130

@cuonglm
Copy link
Member

cuonglm commented Jul 17, 2024

This seems to be a variant of #15943

@srosenberg
Copy link
Author

This seems to be a variant of #15943

Not exactly. I did look at the issue before creating this one. They're definitely related, but the race described here seems to be unique, afaik.

@cuonglm
Copy link
Member

cuonglm commented Jul 17, 2024

This seems to be a variant of #15943

Not exactly. I did look at the issue before creating this one. They're definitely related, but the race described here seems to be unique, afaik.

Hmm, isn't it the same problem: c thread calling to Go without knowing that all init functions are done?

@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 17, 2024
@cherrymui cherrymui added this to the Backlog milestone Jul 17, 2024
@cherrymui
Copy link
Member

The receiving on main_init_done in cgocallbackg1 is conditioned on gp.m.ncgo == 0, that is, whether there is a cgo call on the same M, i.e. on the same OS thread. From the stack trace in the original post there doesn't seem to have any cgo call on that thread. If the cgo call occurs on a different thread, it shouldn't affect gp.m.ncgo on the callback thread.

@ianlancetaylor
Copy link
Contributor

@cherrymui That is a good point. I don't see how this scenario could happen.

@srosenberg How can we recreate the problem? Note that the link to oss-fuzz.com in the initial post doesn't work--it just says "You (email=[email protected]) are not authorized to access this page!".

@srosenberg
Copy link
Author

Apologies, false alarm! After further debugging, main_init_done is working as expected. I stepped through to convince myself that doInit1 does indeed finish _before _ the receive on main_init_done completes; i.e., the callback [1] is invoked only after doInit1.

The actual problem turned out to be a failed dependency injection in our code. The simple reason init in [2] didn't execute is because the entire pkg/ts is excluded from the fuzz target. This was far from obvious at the time since none of our local fuzzing campaigns triggered this null-dereference, and of course we had (wrongly) assumed all of the enclosing packages were included in the build. OSS-Fuzz generated a key prefix (namely /tsd), which requires a pretty-printer [3] in the time-series package (pkg/ts); the pretty-printer is expected to be injected during init, but as we already know, the corresponding init wasn't compiled in.

[1]

cb(frame)

[2] https://github.com/cockroachdb/cockroach/blob/bcd4e5e37f2a95e73347f5ed83db5a24bebfd8d1/pkg/ts/keys.go#L130
[3] https://github.com/cockroachdb/cockroach/blob/8def76b7b83df90d7e2bdf07309d61da8bfcf1d3/pkg/keys/printer.go#L848

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

6 participants