Skip to content

runtime: concurrent map iter+write fatal doesn't respect GOTRACEBACK #68019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bradfitz opened this issue Jun 15, 2024 · 5 comments
Open

runtime: concurrent map iter+write fatal doesn't respect GOTRACEBACK #68019

bradfitz opened this issue Jun 15, 2024 · 5 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Jun 15, 2024

Go version

go version go1.22.4 linux/amd64

What did you do?

  • Introduce a data race on a map (repro: https://go.dev/play/p/rQ1QQ7PMZ86)
  • Don't have sufficient test coverage that runs concurrently to catch it in CI
  • Ship it to prod
  • Have over a million goroutines
  • Wait an hour....

What did you see happen?

... we hit the race on the map in prod after an hour and 💣

The Go runtime then did its fatal:

fatal error: concurrent map iteration and map write

But then we got 3 GB of GOTRACEBACK stacks from the Go runtime for 1M+ stacks, about 3 GB more than we needed to find the bug, overwhelming our logging system in the process, causing a secondary outage.

GOTRACEBACK docs say:

The failure prints stack traces for all goroutines if there is no current goroutine or the failure is internal to the run-time.

And arguably a map race is internal to the run-time insofar as maps are implemented in the runtime.

But the Go runtime know it's the user's fault; note the "but is used when user code is expected to be at fault for the failure" bit here:

// fatal triggers a fatal error that dumps a stack trace and exits.
//
// fatal is equivalent to throw, but is used when user code is expected to be
// at fault for the failure, such as racing map writes.
//
// fatal does not include runtime frames, system goroutines, or frame metadata
// (fp, sp, pc) in the stack trace unless GOTRACEBACK=system or higher.
//
//go:nosplit
func fatal(s string) {
        // Everything fatal does should be recursively nosplit so it
        // can be called even when it's unsafe to grow the stack.
        systemstack(func() {
                print("fatal error: ")
                printindented(s) // logically printpanicval(s), but avoids convTstring write barrier
                print("\n")
        })

        fatalthrow(throwTypeUser)
}

What did you expect to see?

I'd expect fatalthrow(throwTypeUser) to be treated as a normal panic with GOTRACEBACK=single respected.

Currently it's ignored:

bradfitz@book1pro ~ % GOTRACEBACK=single go run maprace.go 2>&1 | wc -l
     618
bradfitz@book1pro ~ % GOTRACEBACK=none go run maprace.go 2>&1 | wc -l
       2
bradfitz@book1pro ~ % GOTRACEBACK=system go run maprace.go 2>&1 | wc -l
    1684

(At least none works, but that's too quiet, hiding the actual problem).

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 15, 2024
@MikeMitchellWebDev
Copy link
Contributor

MikeMitchellWebDev commented Jun 15, 2024

It says the following in runtime1.go

// gotraceback returns the current traceback settings.
//
// If level is 0, suppress all tracebacks.
// If level is 1, show tracebacks, but exclude runtime frames.
// If level is 2, show tracebacks including runtime frames.
// If all is set, print all goroutine stacks. Otherwise, print just the current goroutine.
// If crash is set, crash (core dump, etc) after tracebacking.

@joedian joedian added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 25, 2024
@griesemer griesemer added this to the Go1.24 milestone Jun 26, 2024
@thanm thanm moved this to Todo in Go Compiler / Runtime Jun 26, 2024
@mknyszek mknyszek modified the milestones: Go1.24, Backlog Nov 13, 2024
@mknyszek
Copy link
Contributor

CC @prattmic I know some time ago you reorganized the fatal/panic/throw code in the runtime, I'm not sure how long ago it was, but possibly relevant? Also, would this still be a problem with the new map implementation?

Putting this into backlog in any case.

@prattmic
Copy link
Member

Yes, the reorganization was in CL 390421, though I don't think it affected map fatal error behavior. The new maps will behave the same, they use the same fatal path.

All is forced for fatal calls (== throwTypeUser) here. I think it could be changed to not force all easily if we choose to do so.

FWIW, I think getting all goroutines for races is useful since you might want to try to find the other side of the race. But we should probably respect GOTRACEBACK.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/649535 mentions this issue: runtime: respect GOTRACEBACK for user-triggered runtime panics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

8 participants