-
Notifications
You must be signed in to change notification settings - Fork 816
Dont rely on hashes for collecting chunks together #3192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I agree the extra time spent here is marginal with respect to the overall query path and the added complexity of a collision handling approach is probably not worth the time spent implementing it.
suggestion: Since label.Bytes()
resets the length of the provided []bytes slice I updated the code to do the following:
query.go
// LabelsToKeyString is used to form a string to be used as
// the hashKey. Don't print, use l.String() for printing.
func LabelsToKeyString(l labels.Labels, b []byte) string {
return string(l.Bytes(b))
}
query_test.go
func benchmarkSeriesMap(numSeries int, b *testing.B) {
series := makeSeries(numSeries)
sm := map[string]int{}
// instantiate a reusable slice
buf := make([]byte, 0, 1024)
b.ResetTimer()
for n := 0; n < b.N; n++ {
for i, s := range series {
sm[LabelsToKeyString(s, buf)] = i
}
By reusing the byte array you can shave off about 10%. Although this may not be worth it.
$ /usr/local/bin/go test -benchmem -run=^$ github.com/cortexproject/cortex/pkg/distributor -bench ^BenchmarkSeriesMap
goos: darwin
goarch: amd64
pkg: github.com/cortexproject/cortex/pkg/distributor
BenchmarkSeriesMap-8 13 82358056 ns/op 29421232 B/op 200303 allocs/op
PASS
ok github.com/cortexproject/cortex/pkg/distributor 3.026s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this approach. Please include memory allocations in your benchmarks, and see comment about lowering length when allocating byte slice.
pkg/querier/chunk_store_queryable.go
Outdated
for _, c := range chunks { | ||
fp := client.Fingerprint(c.Metric) | ||
chunksBySeries[fp] = append(chunksBySeries[fp], c) | ||
key := distributor.LabelsToKeyString(c.Metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this function to some utils or "client" package where other fingerprinting functions are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo Peter's comments)
This fixes cortexproject#717 We hit issue in production when our customer issued a query that issued a simple query that touched 20K series, many of which had hash collisions on `client.FastFingerprint`. This made our querier code merge multiple series together which caused counters to go up and down and caused rate() to return weird artifacts. It is quite tricky to debug and I now have a test up which proves that we will suffer if we don't handle collisions explicitly. After looking at several solutions for fixing that, I've finally settled on something super simple, just use a string as the map key. Here is a benchmark where I insert and lookup 100K series: goos: linux goarch: amd64 pkg: github.com/cortexproject/cortex/pkg/distributor BenchmarkSeriesMap-8 842 91633894 ns/op PASS ok github.com/cortexproject/cortex/pkg/distributor 86.307s Now I could have just used labels.String(), but the performance there is quite bad: goos: linux goarch: amd64 pkg: github.com/cortexproject/cortex/pkg/distributor BenchmarkSeriesMap-8 195 373104859 ns/op PASS ok github.com/cortexproject/cortex/pkg/distributor 110.456s Compare this to using client.Fingerprint for the hashing: goos: linux goarch: amd64 pkg: github.com/cortexproject/cortex/pkg/distributor BenchmarkSeriesMap-8 1273 54778130 ns/op This means that we've gotten 70% in this section, but as explained below this is very small when compared to the overall query. ---------------- Now the reason I've stuck to the simplified case is because it takes <100ms for doing this over 100K series, and in most cases the network time to load all the chunks and iterate through them is several orders of magnitude higher. And tbh, I've looked at manually handling collsions, and we need to do labels.Equal(l1, l2) to see if the series we're looking is the actual series in the map or a collision, and the perf there actually worse. I'm open to ideas here. Also note that the values of each map are different, and any complex solution would require interface{} which is arguably worse. sha256 has been tried and is worse in comparision. Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Goutham Veeramachaneni <[email protected]>
I tried the 3 options and am getting some very confusing answers: With b := make([]byte, 0, 512)
With b := make([]byte, 0, 1024)
With
|
Signed-off-by: Goutham Veeramachaneni <[email protected]>
You're right. This
(running |
hashToChunkseries := map[string]ingester_client.TimeSeriesChunk{} | ||
hashToTimeSeries := map[string]ingester_client.TimeSeries{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are now misnamed: the map key is not a hash.
This fixes #717
We hit issue in production when our customer issued a query that issued
a simple query that touched 20K series, many of which had hash
collisions on
client.FastFingerprint
. This made our querier codemerge multiple series together which caused counters to go up
and down and caused rate() to return weird artifacts.
It is quite tricky to debug and I now have a test up which proves that
we will suffer if we don't handle collisions explicitly. After looking
at several solutions for fixing that, I've finally settled on something
super simple, just use a string as the map key.
Here is a benchmark where I insert and lookup 100K series:
Now I could have just used labels.String(), but the performance there is
quite bad:
Compare this to using client.Fingerprint for the hashing:
This means that we've gotten 70% in this section, but as explained below
this is very small when compared to the overall query.
Now the reason I've stuck to the simplified case is because it takes
<100ms for doing this over 100K series, and in most cases the network
time to load all the chunks and iterate through them is several orders
of magnitude higher.
And tbh, I've looked at manually handling collsions, and we need to do
labels.Equal(l1, l2) to see if the series we're looking is the actual
series in the map or a collision, and the perf there actually worse.
I'm open to ideas here. Also note that the values of each map are
different, and any complex solution would require interface{} which is
arguably worse. sha256 has been tried and is worse in comparision.
Signed-off-by: Goutham Veeramachaneni [email protected]
What this PR does:
Which issue(s) this PR fixes:
Fixes #717
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]