-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: GC pause time 20% increase in 1.8beta1 compared to 1.7.3 #18161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/cc @aclements @RLH |
I suspect What is the output with |
There might be a performance regression here, but nothing in this benchmark indicates that it's specifically a GC pause time regression (in fact, it's highly unlikely that it is). In addition to Brad's suggestion, looking at the execution trace may be enlightening. |
@aclements I looked into this test program yesterday (from a blog post via golang-nuts), and it looks like there's GC involvement. After the GC completes, allocation needs to assist with sweep—and when the heap is growing in the first 1/5th of that program, sweep assists won't find any garbage. Single allocations are delayed until the entire heap is swept. I didn't do any comparison with go1.7.3, so can't speak to a regression. I filed #18155 with details on what I found. |
1.7.3: 1.8beta1 |
|
Thanks for the gctrace. It's showing that your STW sweep termination and STW mark termination times in 1.8beta1 are less than a microsecond in both wall-clock and CPU time (except in gc 527). The "9.0" in "0+9.0" belongs to concurrent mark, not sweep term. I think @rhysh is on to something, though. If you grab an execution trace with 1.8 and post it here, I can probably confirm his findings. |
I've updated the previous post with the full traces as attachments (both 1.7.3 and 1.8beta1). I’ve also looked at what the resolution might be for time.Now on Windows. It’s complicated but I’m reasonably confident it’s around 1 micro-second on my machine. So measurement differences of 2.5 milliseconds could not be rounding errors. (I assume ms stand for millisecond not micro). |
After reading Rhys's post I was able to convince myself if half the spans
contain no free objects then a single allocation may end up scanning half
the heap's allocation bits. This is complicated by the fact that the
allocation fast path has be optimized for exactly the opposite situation,
where there are lots of free objects in each span.
One possible way to resolve this is to scan the spans in a pseudo random
order so that if half the object's are free finding a span with one should
not take too many tries, Certainly not half the spans. Perhaps the random
choice is only done by assists that are trying to allocate.
…On Fri, Dec 2, 2016 at 12:26 PM, Austin Clements ***@***.***> wrote:
Thanks for the gctrace. It's showing that your STW sweep termination and
STW mark termination times in 1.8beta1 are less than a microsecond in both
wall-clock and CPU time (except in gc 527). The "9.0" in "0+9.0" belongs to
concurrent mark, not sweep term.
I think @rhysh <https://github.com/rhysh> is on to something, though. If
you grab an execution trace with 1.8 and post it here, I can probably
confirm his findings.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18161 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wn9MtzZbWYyVGKwBU6T38N0zICEpjks5rEFTIgaJpZM4LCmtu>
.
|
@frje, can you try applying https://golang.org/cl/34291 and see if that helps? |
@aclements It looks like my original conclusion was probably wrong : 1.8beta1 does not get longer pauses than 1.7.3 it’s just that the benchmark is inherently imprecise and results are quite volatiles. I’ve now run the benchmark multiple times with billions of calls. 1.7.3 1.8Beta1 34291 As far as I’m concerned, this issue should be closed. |
Closing, thanks for the update. |
On my 64bits Windows 10 machine, the following benchmark https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.go (with msgCount increased to 100M) display a worst time of 15.50ms in 1.8beta1 and 13.00ms in 1.7.3.
The text was updated successfully, but these errors were encountered: