Skip to content

Don't convert chunks to matrixes and then merges them... #713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 13, 2018

Conversation

tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Feb 16, 2018

...use iterators instead.

Also:

  • Update remote read endpoint for SeriesSet interface.
  • Copy the upstream heap-based merging code, and update it for chunk boundaries.
  • Dedupe samples coming out of the merge iterator.

Big drawback of this approach is the optimisation necessary about Seek - as we don't know the end time for any particular-seek'd iterator, we aren't technically able to filter out chunks in the future. This has a large performance impact (basically O(n^2)), as seeks at the very beginning of the range have to include all chunks. To avoid this I've assumed an end time of 48hrs, meaning vector queries with duration larger than 24hrs won't work correctly (ie sum(rate(foo[48h]))). But OTOH this allows code closer to O(n log n).

Before vs after:

benchmark                     old ns/op     new ns/op     delta
BenchmarkChunkQueryable-4     364386071     44484806      -87.79%

benchmark                     old allocs     new allocs     delta
BenchmarkChunkQueryable-4     1019223        453391         -55.52%

benchmark                     old bytes     new bytes     delta
BenchmarkChunkQueryable-4     903011968     14560358      -98.39%

Thats about 8.2x speed up in ns/op and 62x less memory usage.

Part of #209, closes #81 as redundant.

@tomwilkie tomwilkie changed the title [QIPDon't convert chunks to matrixes and then merges them; [WIP] Don't convert chunks to matrixes and then merges them... Feb 16, 2018
@tomwilkie tomwilkie force-pushed the queriers branch 2 times, most recently from 723cfc7 to 71707d3 Compare February 17, 2018 09:49
@tomwilkie tomwilkie changed the title [WIP] Don't convert chunks to matrixes and then merges them... Don't convert chunks to matrixes and then merges them... Feb 17, 2018
@tomwilkie tomwilkie mentioned this pull request Feb 17, 2018
26 tasks
@tomwilkie
Copy link
Contributor Author

Having deployed this to our dev env I didn't see the kind of speed up I was expecting. On investigation the code is very sensitive to "chunk density" - how many chunks there are per 48 hours. I originally modelling it at 48 (one per hour, each 3 hours long), but modelling as one every 6 mins more closely matches the performance I see in dev.

The good news is its still an improvement, but just not as much as before:

benchmark                                old ns/op     new ns/op     delta
BenchmarkChunkQueryableDoubleDelta-4     56575939      61026739      +7.87%
BenchmarkChunkQueryableVarbit-4          54786710      48974757      -10.61%

benchmark                                old allocs     new allocs     delta
BenchmarkChunkQueryableDoubleDelta-4     132677         119285         -10.09%
BenchmarkChunkQueryableVarbit-4          110284         82466          -25.22%

benchmark                                old bytes     new bytes     delta
BenchmarkChunkQueryableDoubleDelta-4     126389364     5278391       -95.82%
BenchmarkChunkQueryableVarbit-4          125695590     4220598       -96.64%

@tomwilkie
Copy link
Contributor Author

We're seeing some more issues with this in testing. Worth reviewing, but don't merge yet.

@tomwilkie tomwilkie changed the title Don't convert chunks to matrixes and then merges them... [WIP] Don't convert chunks to matrixes and then merges them... Feb 27, 2018
@tomwilkie tomwilkie force-pushed the queriers branch 6 times, most recently from c00829b to 1b4861c Compare July 9, 2018 17:59
@tomwilkie
Copy link
Contributor Author

I've resurrected this and fixed the correctness issues (by expanding the tests). Current performance:

$ go test ./pkg/querier -benchmem -bench BenchmarkChunkQueryable
goos: darwin
goarch: amd64
pkg: github.com/weaveworks/cortex/pkg/querier
BenchmarkChunkQueryable/matrixes/DoubleDelta-4         	      20	  57487641 ns/op	144035209 B/op	  101260 allocs/op
BenchmarkChunkQueryable/matrixes/Varbit-4              	      20	  54214617 ns/op	143203628 B/op	   74548 allocs/op
BenchmarkChunkQueryable/iterators/DoubleDelta-4        	      20	  84387322 ns/op	 3114931 B/op	   94822 allocs/op
BenchmarkChunkQueryable/iterators/Varbit-4             	      20	  81035552 ns/op	 2283036 B/op	   68109 allocs/op
PASS
ok  	github.com/weaveworks/cortex/pkg/querier	10.560s

Old path (matrixes) is faster, but 60x more memory usage. Going to see if I can improve this a bit.

@tomwilkie tomwilkie changed the title [WIP] Don't convert chunks to matrixes and then merges them... Don't convert chunks to matrixes and then merges them... Jul 9, 2018
@tomwilkie tomwilkie force-pushed the queriers branch 3 times, most recently from 3bcb1d9 to 008a714 Compare July 9, 2018 21:13
@tomwilkie
Copy link
Contributor Author

Found an optimisation in the ChunksToMatrix function too, so results are now:

$ go test ./pkg/querier -benchmem -bench BenchmarkChunkQueryable 
goos: darwin
goarch: amd64
pkg: github.com/weaveworks/cortex/pkg/querier
BenchmarkChunkQueryable/matrixes/DoubleDelta-4                  100      20379796 ns/op    42817634 B/op       23292 allocs/op
BenchmarkChunkQueryable/iterators/DoubleDelta-4                  20      76533564 ns/op      984637 B/op       16846 allocs/op
BenchmarkChunkQueryable/matrixes/Varbit-4                       100      19467781 ns/op    42817585 B/op       22573 allocs/op
BenchmarkChunkQueryable/iterators/Varbit-4                       20      81691003 ns/op      984516 B/op       16125 allocs/op
PASS
ok      github.com/weaveworks/cortex/pkg/querier    8.490s

I suspect it might make sense to pick the strategy (iterate or merge) based on the length of the query (or the number of chunks in the series).

@tomwilkie
Copy link
Contributor Author

Needed to update the vendored Prometheus for prometheus/prometheus@78efdc6d

- Refactor chunk store to return chunks.
- Refactor querier package to use new Prometheus 2.0 interfaces.
- Have separate querier for ingesters, chunk store and metadata.
- Make remote read handler take a Queryable.

Signed-off-by: Tom Wilkie <[email protected]>
…and the upstream heap-based merging code.

Signed-off-by: Tom Wilkie <[email protected]>
}
sp.LogFields(otlog.Int("sample streams", len(sampleStreams)))
sp.LogFields(otlog.Int("sample streams", len(samples)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log name should probably be samples as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name is misleading I think - its a map from fingerprint to list of list of samples (so we can do a proper tree-based merge later). Have updated variable name to samplesBySeries and change log key to series.

Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, otherwise LGTM.

Are you thinking at some point use matrix for queries with a small number of chunks for speed, and iterator for a large number of chunks to not OOM?

@tomwilkie
Copy link
Contributor Author

Are you thinking at some point use matrix for queries with a small number of chunks for speed, and iterator for a large number of chunks to not OOM?

Right now I'm just trying to stop the querier OOMing for high cardinality / long queries / lots of simultaneous queries. But yeah, once I've got that working it probably makes a lot of sense to switch based on number of chunks per series - good idea!

@tomwilkie tomwilkie merged commit 3402152 into cortexproject:master Jul 13, 2018
@tomwilkie tomwilkie deleted the queriers branch July 13, 2018 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize merging of many small lists of samples
2 participants