-
Notifications
You must be signed in to change notification settings - Fork 18k
strings: 10-30% speed regression in Contains from 1.13 to tip #35686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Perhaps bisect what commit introduced this performance regression? It might have been a change in |
CC @randall77 |
I tried to bisect this and got CL 206938, but I don't feel confident that my bisect was high quality. I'm going to try and retest later on a more consistent machine to make sure I'm not measuring this wrong. Newer test was: package strtest
import (
"strconv"
"strings"
"testing"
)
var (
sink1 bool
sink2 bool
)
func BenchmarkContains(b *testing.B) {
tests := []string{
"This is just a cool test.",
"This is just a cool test. (Or is it?)",
"RIP count reset to (_VARS_RIP-(_GAME_CLEAN_)_SET_0_), neat.",
}
for i, test := range tests {
b.Run(strconv.Itoa(i), func(b *testing.B) {
for i := 0; i < b.N; i++ {
sink1 = strings.Contains(test, "(_")
sink2 = strings.Contains(test, "_)")
}
})
}
} |
I'm reasonably confident that CL 206938 is not the cause of a slowdown in |
I don't see any regression. If anything, time/op on tip is better (compared to 1.13.4):
|
I too can't reproduce this on another machine, but it's very consistent on my main dev machine. I'm working on bisecting this and trying to eliminate any outside variables that could be affecting things, and I'll reply with results (and close this if it turns out to be nothing after all). |
Another benchmark, made longer in runtime to try and make it more consistent: package strtest
import (
"strings"
"testing"
)
var sink bool
func BenchmarkContains(b *testing.B) {
vs := []string{"(_", "_)", " ", ".", "?"}
const str = "This is just a cool test. (Or is it?)"
for i := 0; i < b.N; i++ {
for _, v := range vs {
sink = strings.Contains(str, v)
}
}
} Between Go 1.13.4 and 15bff20 (Oct 31st):
But between Go 1.13.4 and the current tip (ce7829f):
|
@ianlancetaylor Unbelievably, I bisected again with the above code, and again got the same CL:
I too doubt that this is related directly to What is also interesting is that within the same test run, I get similar results, but immediately running it again will give me another set of different but still consistent (according to benchstat) times. At 2d8c199:
At 2d8c199~1:
Bisect log:
Feel free to retitle this issue appropriately, if needed. |
Hm, my mistake, that's not the same CL, but another timers related CL... I'm not sure what this means, then. |
What CPU does run on your dev machine and at what CL are you building at tip? Any CL can potentially cause the benchmark code to align differently and then benchmark the effects of branch alignment. I have seen in the past that it can matter a lot where the benchmark loop is placed in the file which can cause different alignment. With all the side channel attacks on caching/branch prediction there can also be benchmark differences due to different microcode versions of the same CPU: https://www.phoronix.com/scan.php?page=article&item=intel-jcc-microcode&num=1 |
i7-6600U on Linux 5.3.11, all software mitigations disabled. But, I'm on Arch where the ucode version is dated 2019-11-15, so I can roll that back and see what happens. I'll have another crack at benchmarking and trying things this afternoon when I have time. |
Apologies for getting around to this so late. I ran some tests with my previous benchmark, and while the microcode I was running did have a negative performance impact, the effect appears to be even between Go 1.13 and gotip. Here are my results. There are two benchstats, one from running 1.13 first, then one running 1.13 after gotip. With no microcode loaded:
With a microcode from ~6 months ago:
With the latest microcode:
My gotip:
I've also found that code that "looks" like |
This is probably less helpful, but I'm finding this while benchmarking a parser I've written (which does call
https://github.com/hortbot/hortbot/blob/master/internal/cbp/cbp.go |
@zikaeroh It's possible that the performance effects you are seeing with the newer microcode is due to issue #35881. You could confirm this by:
|
While at it can please test if its just a matter of general jump alignment and set funcAlign to 32 instead of 16, compile tip and rebenchmark: go/src/cmd/link/internal/amd64/l.go Line 36 in 50bd1c4
|
First, plain
There's some variance between runs. @markdryan I applied that CL to tip. Running the benchmarks from my most recent post gives:
Pretty consistently between runs. In fact, many of my benchmarks see similar improvements (full run of my project here: https://gist.github.com/zikaeroh/351939ff3b3657f34092caa974733a96). @martisch Changing only that alignment:
Some variance, seemingly better than tip. I was actually considering closing this in reference to #35881 (specifically, #35881 (comment); or at least mentioning it), as it seemed somewhat similar in nature, but I'm definitely not an expert in this area. |
@zikaeroh Thanks so much for taking the time to re-run your benchmarks. I have some questions
|
Leaving aside the potential performance effects of the microcode update for a moment, it's possible that the remaining performance effects you see from one build/run to another on your Contains-4 benchmark are due to the alignment of your test string, "str". If I follow the chain of calls correctly through the standard library code, some iterations of your test loop will end up in bytealg.IndexByte. This method uses AVX2 instructions when the string you're searching is > 32 bytes, which your string happens to be. It doesn't, however, seem to contain any peeling code, so the performance of IndexByte may vary on machines that support AVX2, depending on whether or not the string being searched is 32 byte aligned. One way to determine whether data alignment is responsible for some of the inconsistencies you see with your benchmark would be to make the test string shorter, e.g., 31 bytes. |
Sorry, no idea why I didn't reply to this.
The code for this benchmark is in my original post. |
For what it's worth, |
At this point, years later, I'm not seeing much difference between 1.13 and the subsequent versions; running the same benchmark sometimes gives slightly different results, so maybe this is all alignment, but they are all roughly in the same range. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Benchmark this against 1.13 and tip.
What did you expect to see?
No difference in performance (or hopefully better).
What did you see instead?
Contains
is consistently slower.benchstat
-ing the above withgo test -run=- -bench . -cpu=1,4 -count=10 .
:This has come out of some performance testing between 1.13 and tip in a project I'm working on which shows a consistent 6% regression for a hand-written parser (https://github.com/hortbot/hortbot/blob/master/internal/cbp/cbp_test.go), and this was the first thing I noticed while comparing the profiles. The first thing the parser does is check for one of two tokens before doing any work, with the assumption that if neither are present, there's no work to be done.
In the grand scheme of things, maybe a few extra nanoseconds isn't such a big deal, but I haven't tested other functions in
strings
quite yet.The text was updated successfully, but these errors were encountered: