Skip to content

runtime: segment violation in mexit/vgetrandomPutState/mallocgcSmallNoscan #73577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
smira opened this issue May 2, 2025 · 11 comments
Closed
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@smira
Copy link

smira commented May 2, 2025

Go version

go version go1.24.2 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build571871163=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.2'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

This is a hard to reproduce bug, we don't have a minimal reproducer for it.

The overall setup is the following:

  • Talos Linux
  • containerd 2.0.5 (latest version) built with Go 1.24.2
  • Kubernetes

What did you see happen?

Running a Kubernetes conformance test which creates/destroys many containers concurrently results in SIGSEGV in containerd.

We went back through CI history of Talos Linux, and the moment of containerd crashes happens on the same day as we updated Go from 1.23 to 1.24 in our toolchain. After that Go 1.24 was updated to Go 1.24.2 without any changes vs. the crash behavior.

We don't have an easy way to reproduce the issue, but we can test patches.

In each crash, we observe the following backtrace in the coredump:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055583d8960c0 in runtime.mallocgcSmallNoscan (size=352, typ=<optimized out>, needzero=false, ~r0=<optimized out>,
    ~r1=<optimized out>) at /go/src/runtime/malloc.go:1280

warning: 1280   /go/src/runtime/malloc.go: No such file or directory
[Current thread is 1 (LWP 151466)]
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/smira/Documents/talos/cores/containerd.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) bt
#0  0x000055583d8960c0 in runtime.mallocgcSmallNoscan (size=352, typ=<optimized out>, needzero=false, ~r0=<optimized out>,
    ~r1=<optimized out>) at /go/src/runtime/malloc.go:1280
#1  0x000055583d8f9fc5 in runtime.mallocgc (size=352, typ=0x0, needzero=false, ~r0=<optimized out>)
    at /go/src/runtime/malloc.go:1055
#2  0x000055583d8fef05 in runtime.growslice (oldPtr=0xc0003a8000, newLen=22, oldCap=<optimized out>, num=<optimized out>,
    et=0x55583f86b840, ~r0=...) at /go/src/runtime/slice.go:264
#3  0x000055583d8f6896 in runtime.vgetrandomPutState (state=139638786264064) at /go/src/runtime/vgetrandom_linux.go:78
#4  0x000055583d8bfc67 in runtime.mdestroy (mp=0xc00177d008) at /go/src/runtime/os_linux.go:419
#5  0x000055583d8c92fb in runtime.mexit (osStack=true) at /go/src/runtime/proc.go:1992
#6  0x000055583d8c8f76 in runtime.mstart0 () at /go/src/runtime/proc.go:1817
#7  0x000055583d902425 in runtime.mstart () at /go/src/runtime/asm_amd64.s:395
#8  0x000055583f56f238 in crosscall1 () at gcc_amd64.S:42
#9  0x00007effe8a9bb38 in ?? ()
#10 0x00007f003042bb38 in ?? ()
#11 0x00007f003042b580 in ?? ()
#12 0x00007effe8a9bb10 in ?? ()
#13 0x000000c000f948c0 in ?? ()
#14 0x000055583d902420 in ?? ()
#15 0x000055583f56ec44 in threadentry (v=<optimized out>) at gcc_linux_amd64.c:90
#16 0x00007f00307227f6 in ?? ()
#17 0x0000000000000000 in ?? ()

The containerd binary and coredump(s) are attached to the issue.

In the same exact setup, replacing just containerd binary built with go1.23.8 resolves this issue.

What did you expect to see?

No SIGSEGV.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label May 2, 2025
@smira
Copy link
Author

smira commented May 2, 2025

I couldn't attatch to GitHub, coredumps and the binary: https://drive.google.com/file/d/13AEWG95UIJ07YxvSUpfdbNPTeEcDAYr_/view?usp=sharing

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label May 2, 2025
@dsseng
Copy link

dsseng commented May 2, 2025

Investigated the problem a bit deeper: seems eb6f2c2 is the culprit. I have reverted it using the attached patch in out Go build, and containerd built with it seems to not have crashed for 5 runs of the aforementioned Kubernetes test.

For now all the tests are on amd64, not sure if it's the assembly part that causes this intermittent fault. I'm currently working on debugging this a bit deeper (e.g. to create a minimum reproducer outside of Talos/containterd, as well as find the bug)

Reverting patch (with some conflicts fixed): https://gist.github.com/dsseng/5b3db47436a4c360986def9302f97548
(Failed to upload "0001-Revert-runtime-use-vDSO-for-getrandom-on-linux.patch")

@thepudds
Copy link
Contributor

thepudds commented May 2, 2025

Hi @smira, have you had a chance to look over #73141?

@dsseng
Copy link

dsseng commented May 2, 2025

Hi @smira, have you had a chance to look over #73141?

I will apply the change mentioned now and try it, thanks

@smira
Copy link
Author

smira commented May 2, 2025

Hi @smira, have you had a chance to look over #73141?

yes, it looks like same issue, we're happy to re-test with the next Go 1.24.3 is released, thank you!

@dsseng
Copy link

dsseng commented May 2, 2025

Hi @smira, have you had a chance to look over #73141?

0ab64e2 cherry-picked on top of 1.24.2 also fixes the bug, so yes, that's the same issue

@thepudds
Copy link
Contributor

thepudds commented May 2, 2025

@prattmic, FYI.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 2, 2025
@cagedmantis cagedmantis added this to the Go1.25 milestone May 2, 2025
@cagedmantis
Copy link
Contributor

Duplicate issue #73141. Please feel free to re-open if the release doesn't fix this issue.

@cagedmantis
Copy link
Contributor

I changed my mind. Leaving this open until after the release and confirmation.

@smira
Copy link
Author

smira commented May 8, 2025

Thank you, I can confirm it's fixed with 1.24.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

6 participants