Index series, not chunks #875

tomwilkie · 2018-07-11T21:05:33Z

Fixes #433 , fixes #884

Consist of 4 changes:

Move compositeSchema abstraction up to componsiteStore.

Promote the composite schema abstraction to "composite chunk store" - a chunk store which delegates to different chunk stores based on time. This allows us to vary the store implementation over time, and not just the schema. This will unblock the new bigtable storage adapter (using columns instead of rows), and allow us to more easily implement the iterative intersections and indexing of series instead of chunks.

Corner case when a writing chunks which span multiple stores: they are written to both stores, and instead of using the chunk start/end we use the schema start/end. This will lead to duplication of the index entries on schema migrations, but is actually already the case for day boundaries anyway. It will lead to duplicate writes of the chunk on schema migrations - they should be deduped by the underlying store.

Index series, not chunks

We should index the series, not chunks; this will reduce the number of entries in the index by replication factor * (chunk size / bucket size), or 3 * 6hrs / 24hrs - ie 12x. This will however mean we need another index from series to chunks, introducing 1 extra write an N extra reads per query. Expectation is a reduction in query latency (and bigtable query usage, and memory cost) by 12x, and then an increase by 2x as we have to do a bunch of queries.

This change introduces the seriesStore, a new chunk store implementation that, combined with the v9 schema, indexes series not chunks.

I tried to adapt the original chunk store to support this style of indexing - easy on the write path, but the read path became even more of a rats nest. So I factored out the common bits as best I could and made a new chunk store.

Add heapCache, a cache that uses a heap for evictions.

heapCache is a simple string -> interface{} cache which uses a heap to manage evictions. O(log N) inserts and updates, O(1) gets.

Skip index queries for high cardinality labels.

Firstly, cache the length of index rows we query (by the hash and range key). Secondly, fail for rows with > 100k, either because the cache told us so, or because we read them. Finally, allow matchers to fail on cardinality errors but proceed with the query (as long as at least one matcher succeeds), and then filter results.

Notably, after this change, queries on two high-cardinality labels that would have results in a small number of series will fail.

tomwilkie · 2018-07-12T18:32:41Z

Got this deployed in our dev env and it seems to work. Hard to draw any conclusions as there aren't any taxing queries, but there seems to be a drop in bigtable latency in line with a reduction the amount of data we're reading.

tomwilkie · 2018-07-16T10:03:18Z

Running in prod for a few days now and seen the worse queries go from minutes to seconds, in line with a 6-10x latency improvement. Once we rotate out the index tables, will report on their size.

tomwilkie · 2018-07-17T17:14:46Z

The was a bug in the way we handled chunk spanning the transition; fixed now.

bboreham

A few thoughts.

bboreham · 2018-07-19T14:06:23Z

cmd/ingester/main.go

@@ -70,7 +70,7 @@ func main() {
 		os.Exit(1)
 	}

-	chunkStore, err := chunk.NewStore(chunkStoreConfig, schemaConfig, storageClient)
+	chunkStore, err := chunk.NewCompositeStore(chunkStoreConfig, schemaConfig, storageClient)


ISTM that you could have left the exported name as NewStore() to minimise impact elsewhere.

Good idea, done.

bboreham · 2018-07-19T14:13:56Z

pkg/chunk/chunk_store.go

@@ -93,22 +93,34 @@ func (c *store) Stop() {

 // Put implements ChunkStore
 func (c *store) Put(ctx context.Context, chunks []Chunk) error {
+	for _, chunk := range chunks {
+		if err := c.PutOne(ctx, chunk.From, chunk.Through, chunk); err != nil {


This seems like rather a large change in performance characteristics, if you have a number of chunks to write.

bboreham · 2018-07-19T14:15:22Z

pkg/chunk/chunk_store_test.go

 	"github.com/weaveworks/cortex/pkg/util/extract"
 	"golang.org/x/net/context"

 	"github.com/weaveworks/common/test"
 	"github.com/weaveworks/common/user"
 )

+var schemas = []struct {
+	name              string
+	fn                func(cfg SchemaConfig) Schema


I was expecting storeFn to come in this commit, even though they would all be the same at this point.

cboggs · 2018-07-20T13:47:59Z

Seems to me that this schema should also allow (or lay the groundwork for) successful queries without invariant metric names... am I thinking on the right path there?

tomwilkie · 2018-07-25T16:57:11Z

Seems to me that this schema should also allow (or lay the groundwork for) successful queries without invariant metric names... am I thinking on the right path there?

It could, but still doesn't avoid the problem of hotspotting the "name" row. Although it would reduce the load on that row by 12x, so might be doable at this point.

tomwilkie · 2018-07-25T17:33:21Z

This seems like rather a large change in performance characteristics, if you have a number of chunks to write.

Yes, it potentially is. I don't think it will effect normal operation, as I think we only flush one chunk at once normally - we've been running this for a week or so, haven't notices anything. But may effect shutdown flushes.

I think the correct solution is to potentially split out the ChunkStore (get and put chunks by ID) and ChunkIndex (write entries and find entries by matchers), something that is starting to happen with the seriesStore anyway. Then the composite store can write the index entries and the chunks once, in a batch. WDYT?

tomwilkie · 2018-07-25T17:45:23Z

Do we really need to write another cache?

How many do we have? I see the chunk caches (which are fundamentally different things), a vendored prometheus treecache (something to do with zookeeper) and some caches in k8s/client-go. I guess the grpc connection pool is a cache, but thats a different beast too. We also cache a single result in the chunk iterator, but thats just a single result.

I'm not actually aware of any other caches (like this) in use in the codebase. We could use github.com/bluele/gcache, but I was never a fan of that library.

OTOH, not clear using a heap for evictions if the best idea; we could thread a link list through the entries and reorder that to get LRU.

tomwilkie · 2018-07-25T17:52:00Z

I've tidied up this PR so it should be more reviewable. I see two things left todo: figure out what do to with chunks on store boundaries (try and bring back the batching) and accurately record cardinality or rows for multi-day queries.

Let me know if you get a chance to take a look @bboreham @cboggs @gouthamve.

bboreham · 2018-07-26T15:40:19Z

By "we" I meant the Go community.

I've used github.com/patrickmn/go-cache successfully in https://github.com/weaveworks/scope, to replace github.com/bluele/gcache for performance reasons.

bboreham · 2018-07-26T15:51:49Z

I think we only flush one chunk at once normally

If the flush queue gets very large (and we've had it in the millions many times) this assumption breaks down.

Then the composite store can write the index entries and the chunks once, in a batch. WDYT?

I wrote some thoughts at #684 (comment)

csmarchbanks · 2018-08-13T15:24:13Z

pkg/chunk/chunk_store.go

 	c.cache.Stop()
 }

 // Put implements ChunkStore
-func (c *Store) Put(ctx context.Context, chunks []Chunk) error {
+func (c *store) Put(ctx context.Context, chunks []Chunk) error {
+	for _, chunk := range chunks {


Would it make more sense to PutChunks then calculate all the index entries and write them at once? Might alleviate the performance characteristic changes Bryan commented about

csmarchbanks · 2018-08-13T20:16:39Z

pkg/chunk/schema_config.go

@@ -60,6 +60,7 @@ func (cfg *SchemaConfig) RegisterFlags(f *flag.FlagSet) {
 	f.Var(&cfg.V6SchemaFrom, "dynamodb.v6-schema-from", "The date (in the format YYYY-MM-DD) after which we enable v6 schema.")
 	f.Var(&cfg.V7SchemaFrom, "dynamodb.v7-schema-from", "The date (in the format YYYY-MM-DD) after which we enable v7 schema (Deprecated).")
 	f.Var(&cfg.V8SchemaFrom, "dynamodb.v8-schema-from", "The date (in the format YYYY-MM-DD) after which we enable v8 schema (Deprecated).")
+	f.Var(&cfg.V9SchemaFrom, "dynamodb.v9-schema-from", "The data (in the format YYYY-MM-DD) after which we enable v9 schema (Series indexing).")


typo, data should be date

csmarchbanks · 2018-08-13T21:18:23Z

pkg/chunk/series_store.go

+	level.Debug(log).Log("Chunk IDs", len(chunkIDs))
+
+	// Protect ourselves against OOMing.
+	if len(chunkIDs) > c.cfg.QueryChunkLimit {


Would it make more sense to do this after filtering out the chunks by time?

csmarchbanks · 2018-08-13T21:55:05Z

I also have some concerns about putting one chunk vs many chunks, and added a comment with a possible idea. I like the idea of splitting ChunkStore and ChunkIndex, or some of Bryan's comments, but perhaps not as part of this PR

Promote the composite schema abstraction to "composite chunk store" - a chunk store which delegates to different chunk stores based on time. This allows us to vary the store implementation over time, and not just the schema. This will unblock the new bigtable storage adapter (using columns instead of rows), and allow us to more easily implement the iterative intersections and indexing of series instead of chunks. Corner case when a writing chunks which span multiple stores: they are written to both stores, and instead of using the chunk start/end we use the schema start/end. This will lead to duplication of the index entries on schema migrations, but is actually already the case for day boundaries anyway. It will lead to duplicate writes of the chunk on schema migrations - they should be deduped by the underlying store. Signed-off-by: Tom Wilkie <[email protected]>

We should index the series, not chunks; this will reduce the number of entries in the index by `replication factor * (chunk size / bucket size)`, or 3 * 6hrs / 24hrs - ie 12x. This will however mean we need another index from series to chunks, introducing 1 extra write an N extra reads per query. Expectation is a reduction in query latency (and bigtable query usage, and memory cost) by 12x, and then an increase by 2x as we have to do a bunch of queries. This change introduces the seriesStore, a new chunk store implementation that, combined with the v9 schema, indexes series not chunks. I tried to adapt the original chunk store to support this style of indexing - easy on the write path, but the read path became even more of a rats nest. So I factored out the common bits as best I could and made a new chunk store. Signed-off-by: Tom Wilkie <[email protected]> Tidy up some of the logging. Signed-off-by: Tom Wilkie <[email protected]>

tomwilkie · 2018-08-16T11:01:10Z

I've used github.com/patrickmn/go-cache successfully in https://github.com/weaveworks/scope, to replace github.com/bluele/gcache for performance reasons.

I've looked at go-cache, it still uses a background goroutine to periodically expunge entries from the cache. AFAICT this means the cache can grow without bounds between these periods, and it also locks the entire cache to do this. The heap cache isn't ideal, I'm going to update it to use a simple FIFO list for evictions, but I think its better that go-cache.

fifoCache is a simple string -> interface{} cache which uses a fifo to manage evictions. O(1) inserts, updates and gets. Signed-off-by: Tom Wilkie <[email protected]>

Firstly, cache the length of index rows we query (by the hash and range key). Secondly, fail for rows with > 100k, either because the cache told us so, or because we read them. Finally, allow matchers to fail on cardinality errors but proceed with the query (as long as at least one matcher succeeds), and then filter results. Notably, after this change, queries on two high-cardinality labels that would have results in a small number of series will fail. Signed-off-by: Tom Wilkie <[email protected]>

Signed-off-by: Tom Wilkie <[email protected]>

Signed-off-by: Goutham Veeramachaneni <[email protected]>

csmarchbanks

I am approving this since it has some nice changes. If anyone is worried about the performance characteristic change they can leave it in v6 until those are updated.

tomwilkie mentioned this pull request Jul 12, 2018

Composite chunk store #877

Closed

tomwilkie force-pushed the schema-v9 branch 6 times, most recently from 2572bf4 to bcd0443 Compare July 12, 2018 15:18

tomwilkie changed the title ~~[WIP] v9 Schema: Index series, not chunks~~ v9 Schema: Index series, not chunks Jul 12, 2018

tomwilkie force-pushed the schema-v9 branch from b315974 to 9248f0d Compare July 13, 2018 09:42

tomwilkie mentioned this pull request Jul 16, 2018

Reduce duplication when writing #607

Closed

This was referenced Jul 16, 2018

Don't query very high cardinality labels #884

Closed

Don't query high cardinality labels. #886

Closed

bboreham reviewed Jul 19, 2018

View reviewed changes

tomwilkie force-pushed the schema-v9 branch 2 times, most recently from 420ed55 to 01c790c Compare July 25, 2018 16:56

tomwilkie force-pushed the schema-v9 branch 5 times, most recently from 4c6a07b to 5c2e17d Compare July 25, 2018 17:20

tomwilkie changed the title ~~v9 Schema: Index series, not chunks~~ Index series, not chunks Jul 25, 2018

tomwilkie force-pushed the schema-v9 branch from 5c2e17d to c15a682 Compare July 25, 2018 17:29

tomwilkie force-pushed the schema-v9 branch from c15a682 to 3c7bebd Compare July 25, 2018 17:48

csmarchbanks reviewed Aug 13, 2018

View reviewed changes

tomwilkie added 2 commits August 16, 2018 11:55

tomwilkie force-pushed the schema-v9 branch from 1fb2d05 to 43c3817 Compare August 16, 2018 10:55

tomwilkie added 3 commits August 16, 2018 13:18

Add fifoCache, a cache that uses a fifo linked list for evictions.

5d50cd2

fifoCache is a simple string -> interface{} cache which uses a fifo to manage evictions. O(1) inserts, updates and gets. Signed-off-by: Tom Wilkie <[email protected]>

Review feedback.

281c33f

Signed-off-by: Tom Wilkie <[email protected]>

tomwilkie force-pushed the schema-v9 branch from b9afb89 to 281c33f Compare August 16, 2018 12:21

Name and finish spans properly

6276249

Signed-off-by: Goutham Veeramachaneni <[email protected]>

csmarchbanks approved these changes Aug 16, 2018

View reviewed changes

tomwilkie merged commit 2f1e56b into cortexproject:master Aug 23, 2018

tomwilkie deleted the schema-v9 branch August 23, 2018 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index series, not chunks #875

Index series, not chunks #875

tomwilkie commented Jul 11, 2018 •

edited

Loading

tomwilkie commented Jul 12, 2018

tomwilkie commented Jul 16, 2018

tomwilkie commented Jul 17, 2018

bboreham left a comment

bboreham Jul 19, 2018

tomwilkie Jul 25, 2018

bboreham Jul 19, 2018

bboreham Jul 19, 2018

cboggs commented Jul 20, 2018

tomwilkie commented Jul 25, 2018

tomwilkie commented Jul 25, 2018

tomwilkie commented Jul 25, 2018

tomwilkie commented Jul 25, 2018

bboreham commented Jul 26, 2018

bboreham commented Jul 26, 2018

csmarchbanks Aug 13, 2018

csmarchbanks Aug 13, 2018

csmarchbanks Aug 13, 2018

csmarchbanks commented Aug 13, 2018

tomwilkie commented Aug 16, 2018

csmarchbanks left a comment

Index series, not chunks #875

Index series, not chunks #875

Conversation

tomwilkie commented Jul 11, 2018 • edited Loading

tomwilkie commented Jul 12, 2018

tomwilkie commented Jul 16, 2018

tomwilkie commented Jul 17, 2018

bboreham left a comment

Choose a reason for hiding this comment

bboreham Jul 19, 2018

Choose a reason for hiding this comment

tomwilkie Jul 25, 2018

Choose a reason for hiding this comment

bboreham Jul 19, 2018

Choose a reason for hiding this comment

bboreham Jul 19, 2018

Choose a reason for hiding this comment

cboggs commented Jul 20, 2018

tomwilkie commented Jul 25, 2018

tomwilkie commented Jul 25, 2018

tomwilkie commented Jul 25, 2018

tomwilkie commented Jul 25, 2018

bboreham commented Jul 26, 2018

bboreham commented Jul 26, 2018

csmarchbanks Aug 13, 2018

Choose a reason for hiding this comment

csmarchbanks Aug 13, 2018

Choose a reason for hiding this comment

csmarchbanks Aug 13, 2018

Choose a reason for hiding this comment

csmarchbanks commented Aug 13, 2018

tomwilkie commented Aug 16, 2018

csmarchbanks left a comment

Choose a reason for hiding this comment

tomwilkie commented Jul 11, 2018 •

edited

Loading