-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: SIGSEGV in runtime.deltimer on linux-mips-rtrk during ReadMemStats #43712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Both of those failures are during the current-cycle code freeze, as is the one in #43625, but I notice that #43625 was unmarked as release-blocker without further comment. @golang/release, should this be milestoned to 1.16, given that it may be a regression from (or exposed by) the timer changes in 1.16? (Marking as release-blocker until that is answered, but I'm not particularly invested in the answer.) |
I think this issue is not related to #43625 Might related to #35541 since tpp shouldn't be nil Line 317 in b78b427
CC @cherrymui |
I don't think this is related to #35541, at least not directly. There, the problem is likely a malformed pointer but not 0. (Of course a malformed pointer could lead to memory corruption which could lead to anything.) I'm not really familiar with the timer code to tell for sure that tpp cannot be 0.
This comment, and MIPS being a weak memory model machine, make me worried. |
I've been looking at the same thing. |
We only have two failing cases to look at, but they are very similar. In both cases it is running the test The test running is In both failures
|
Change https://golang.org/cl/284775 mentions this issue: |
@ianlancetaylor I think you've figured it out: I think your fix is exactly right, since then the More broadly, maybe the scavenger really shouldn't use timers, or |
It turns out that this is also a problem in Go 1.15. @gopherbot Please open a backport issue for Go 1.15. |
Backport issue(s) opened: #43833 (for 1.15). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
@mknyszek Is it known whether it's also a problem for 1.14? |
I don't believe so. |
Change https://golang.org/cl/287092 mentions this issue: |
…Waiting status Before this CL, the following sequence was possible: * GC scavenger starts and sets up scavenge.timer * GC calls readyForScavenger, but sysmon is sleeping * program calls runtime.GOMAXPROCS to shrink number of processors * procresize destroys a P, the one that scavenge.timer is on * (*pp).destroy calls moveTimers, which gets to the scavenger timer * scavenger timer is timerWaiting, and moveTimers clears t.pp * sysmon wakes up and calls wakeScavenger * wakeScavengers calls stopTimer on scavenger.timer, still timerWaiting * stopTimer calls deltimer which loads t.pp, which is still nil * stopTimer tries to increment deletedTimers on nil t.pp, and crashes The point of vulnerability is the time that t.pp is set to nil by moveTimers and the time that t.pp is set to non-nil by moveTimers, which is a few instructions at most. So it's not likely and in particular is quite unlikely on x86. But with a more relaxed memory model the area of vulnerability can be somewhat larger. This appears to tbe the cause of two builder failures in a few months on linux-mips. This CL fixes the problem by making moveTimers change the status from timerWaiting to timerMoving while t.pp is clear. That will cause deltimer to wait until the status is back to timerWaiting, at which point t.pp has been set again. For #43712 Fixes #43833 Change-Id: I66838319ecfbf15be66c1fac88d9bd40e2295852 Reviewed-on: https://go-review.googlesource.com/c/go/+/284775 Trust: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> (cherry picked from commit d2d155d) Reviewed-on: https://go-review.googlesource.com/c/go/+/287092 Run-TryBot: Carlos Amedee <carlos@golang.org>
2021-01-14T21:55:29-eb33002/linux-mips-rtrk
2020-12-17T20:25:45-8fcf318/linux-mips-rtrk
CC @aclements @mknyszek @prattmic @mengzhuo; compare #43625.
The text was updated successfully, but these errors were encountered: