Prevent OOMs in the chunk store. #873

tomwilkie · 2018-07-10T09:36:55Z

This change moves the parsing and instantiation of the struct Chunks after the intersection of their IDs - we were finding on very high cardinality timeseries (~8m) and queriers would OOM just building the array of chunks from the index, even if the intersection of label matchers was relatively small.

We also add a limit to the number of chunks fetched in a single query.

The big casualty here is removing the support for metadataInIndex chunks - comment says these were last used in Nov 2016, so this is probably safe.

Signed-off-by: Tom Wilkie [email protected]

- Merge and dedupe set of strings, not sets of parse chunks. - Limit number of chunks fetched in a single query. - Add lots more debug logging to the querier to help track this all down. Signed-off-by: Tom Wilkie <[email protected]>

tomwilkie · 2018-07-10T11:38:31Z

After deploying this we're seeing about a 2x reduction in steady state memory usage, and almost the complete elimination of the spikes that were causing us problems. NB this was deployed along side #713, but when #713 was deployed on its own we didn't see the same reductions.

bboreham

Looking good - some small points

bboreham · 2018-07-10T13:07:51Z

pkg/chunk/chunk_store.go

-	}
-
-	var chunkSet ByKey
+func (c *Store) parseIndexEntries(ctx context.Context, entries []IndexEntry, matcher *labels.Matcher) ([]string, error) {


I think it's worth retaining the comment that entries are returned in order, and here seems to be the right place for it.

bboreham · 2018-07-10T13:10:54Z

pkg/chunk/storage/by_key_test.go

+func (cs ByKey) Less(i, j int) bool { return lessByKey(cs[i], cs[j]) }
+
+// This comparison uses all the same information as Chunk.ExternalKey()
+func lessByKey(a, b chunk.Chunk) bool {


This function was a good idea when it saved the creation of millions of temp strings in queries; as part of a test it looks over-complicated.

bboreham · 2018-07-10T13:17:26Z

pkg/chunk/chunk_store_test.go

@@ -21,6 +21,12 @@ import (
 	"github.com/weaveworks/common/user"
 )

+func init() {
+	var al util.AllowedLevel
+	al.Set("debug")


bboreham · 2018-07-10T13:23:20Z

pkg/chunk/chunk_store.go

@@ -168,6 +170,10 @@ func (c *Store) calculateDynamoWrites(userID string, chunks []Chunk) (WriteBatch

 // Get implements ChunkStore
 func (c *Store) Get(ctx context.Context, from, through model.Time, allMatchers ...*labels.Matcher) (model.Matrix, error) {
+
+	logger := util.WithContext(ctx, util.Logger)
+	level.Debug(logger).Log("msg", "ChunkStore.Get", "from", from, "through", through, "matchers", len(allMatchers))


This log line is very similar to the tracing log line below; perhaps we should have a wrapper so we don't repeat very similar code?

bboreham · 2018-07-10T13:27:25Z

pkg/chunk/chunk_store.go

 	filters, matchers := util.SplitFiltersAndMatchers(allMatchers)
 	chunks, err := c.lookupChunksByMetricName(ctx, from, through, matchers, metricName)
 	if err != nil {
 		return nil, err
 	}

+	level.Debug(logger).Log("func", "ChunkStore.getMetricNameChunks", "msg", "Chunks in index", "n", len(chunks))


"msg" and "n" seem like an unnecessary level of indirection

bboreham · 2018-07-10T14:14:55Z

pkg/chunk/chunk_store.go

+	}
+
+	// Merge entries in order because we wish to keep label series together consecutively
+	chunkIDs := nWayIntersectStrings(chunkIDSets)


Rather than assembling a list then breaking it down to intersect, I'd think it is cheaper to do the intersecting as the results come in. But that could be a subsequent change.

Signed-off-by: Tom Wilkie <[email protected]>

tomwilkie · 2018-07-12T09:30:51Z

@bboreham PTAL?

bboreham

LGTM!

bboreham · 2018-07-12T09:47:25Z

pkg/chunk/chunk_store.go

@@ -166,20 +169,44 @@ func (c *Store) calculateDynamoWrites(userID string, chunks []Chunk) (WriteBatch
 	return writeReqs, nil
 }

+type spanLogger struct {


comment describing the intent of this struct ?

Signed-off-by: Tom Wilkie <[email protected]>

tomwilkie · 2018-07-12T10:08:25Z

Thanks!

Prevent OOM in the chunk store.

e42d765

- Merge and dedupe set of strings, not sets of parse chunks. - Limit number of chunks fetched in a single query. - Add lots more debug logging to the querier to help track this all down. Signed-off-by: Tom Wilkie <[email protected]>

tomwilkie force-pushed the chunk-store-oom branch from d483fd0 to e42d765 Compare July 10, 2018 09:40

bboreham reviewed Jul 10, 2018

View reviewed changes

tomwilkie added 4 commits July 10, 2018 17:57

Review feedback: Use simple Less function for chunks in test.

aef2e91

Signed-off-by: Tom Wilkie <[email protected]>

Review feedback: Intersect chunkIDs as we receive them.

445d006

Signed-off-by: Tom Wilkie <[email protected]>

Review feedback: unify logging and tracing in the chunk store.

1557226

Signed-off-by: Tom Wilkie <[email protected]>

Review feedback: remove debug logging from tests.

16e1167

Signed-off-by: Tom Wilkie <[email protected]>

tomwilkie force-pushed the chunk-store-oom branch from 323d857 to 16e1167 Compare July 10, 2018 17:02

bboreham approved these changes Jul 12, 2018

View reviewed changes

Review feedback: explain spanLogger.

b207ccc

Signed-off-by: Tom Wilkie <[email protected]>

tomwilkie merged commit 45dc92f into cortexproject:master Jul 12, 2018

tomwilkie deleted the chunk-store-oom branch July 12, 2018 10:01

tomwilkie mentioned this pull request Aug 29, 2018

Lazily decode chunks for fuzzy metric names #453

Closed

bboreham mentioned this pull request Nov 8, 2019

metadataInIndex flag is broken #1795

Closed

pracucci mentioned this pull request Nov 18, 2019

Removed chunk.metadataInIndex because unused #1836

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent OOMs in the chunk store. #873

Prevent OOMs in the chunk store. #873

Uh oh!

tomwilkie commented Jul 10, 2018 •

edited

Loading

Uh oh!

tomwilkie commented Jul 10, 2018

Uh oh!

bboreham left a comment

Uh oh!

bboreham Jul 10, 2018

Uh oh!

tomwilkie Jul 10, 2018

Uh oh!

bboreham Jul 10, 2018

Uh oh!

bboreham Jul 10, 2018

Uh oh!

bboreham Jul 10, 2018

Uh oh!

bboreham Jul 10, 2018

Uh oh!

bboreham Jul 10, 2018

Uh oh!

tomwilkie commented Jul 12, 2018

Uh oh!

bboreham left a comment

Uh oh!

bboreham Jul 12, 2018

Uh oh!

tomwilkie commented Jul 12, 2018

Uh oh!

Uh oh!

Prevent OOMs in the chunk store. #873

Prevent OOMs in the chunk store. #873

Uh oh!

Conversation

tomwilkie commented Jul 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomwilkie commented Jul 10, 2018

Uh oh!

bboreham left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomwilkie commented Jul 12, 2018

Uh oh!

bboreham left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomwilkie commented Jul 12, 2018

Uh oh!

Uh oh!

tomwilkie commented Jul 10, 2018 •

edited

Loading