-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: scavenger's frequent wake-ups interfere with runnext #35271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm guessing this probably turned flaky due to the goroutine-preemption changes? |
This is preventing me from getting meaningful TryBot results on Windows for CL 204521. (https://storage.googleapis.com/go-build-log/66ca91a7/windows-amd64-longtest_cfa17417.log) |
This is now failing much more consistently on the linux longtests, as of 21445b0. The scavenger is likely going to sleep and waking up much more often than before (though I'm not sure that it's enough to still matter. The time spent "measuring" is 100 ms, though, so I suppose I could see that making this worse. |
Change https://golang.org/cl/206098 mentions this issue: |
This test is failing consistently in the longtest builders, potentially masking regressions in other packages. Updates #35271 Change-Id: Idc03171c0109b5c8d4913e0af2078c1115666897 Reviewed-on: https://go-review.googlesource.com/c/go/+/206098 Reviewed-by: Carlos Amedee <[email protected]>
I'll take a look into this since Austin is currently focused on the memory corruption issues. |
OK! I got somewhere.
This implies that the frequent wake-ups from the scavenger are having a detrimental effect, at least on Linux (and perhaps a beneficial one on Windows?). Keep in mind that I think there are actually two issues at play here: one introduced around Oct 31st when Windows started failing, and the scavenger sleeping/waking more often. I'll tackle the latter first and come back to the former if I can get that fixed. Note also that my fixes to #35788 make the test fail around 1 in 500 times when run directly (as opposed to not reproducible at all). |
@aclements informed me that One solution could be to just have the scavenger do more work. But perhaps a better solution is to just call "ready" with "next" as false. This way the scavenger never ends up in this LIFO, and I can confirm that this fixes the test flake (1000 consecutive runs and no failure, with the 1 GiB allocation in front of the test). This fix stays in line with the reasoning that the scavenger should mostly stay out of the way of the rest of the application, and if the scavenger takes longer to get scheduled (i.e. sleeps longer than it wanted to) it'll simply request to sleep for less to account for this overhead, effectively doing more work anyway. This also gave me an opportunity to get some numbers for the new self-paced scavenger: each cycle is on the order of 300-1000µs, which means our order-of-magnitude assumption in the old scavenger was mostly on-target. :) |
Change https://golang.org/cl/208380 mentions this issue: |
FWIW I don't think this should block the beta. These frequent wake-ups definitely don't seem to show up in benchmarks. |
From
windows-amd64-longtest
(https://build.golang.org/log/3fbe37b41ed88143f4b6d6f6e277e99891537be7):CC @aclements @mknyszek
The text was updated successfully, but these errors were encountered: