Cache index writes (and change some flag names) #1024

gouthamve · 2018-09-21T11:31:16Z

Depends on #1011 (The cache interface is changing there)

Fixes #957

Now, we've seen that we write an average of 11 index entries per chunk. In the v9 schema, 10 of those entries are series dependant while one is the series-id ---> chunkID mapping. Essentially we're doing 10x repeated writes!

Now, this PR let's you cache and dedupe the entries letting you reduce the write load on the database by 10x. But you need to make sure the cache size is 11 x numSeries (depends on your setup), else you'll end up evicting the entries before the series can hit them.

Still needs to be tested.

tomwilkie · 2018-09-23T11:04:07Z

pkg/chunk/chunk_store.go

@@ -67,6 +79,8 @@ func (cfg *StoreConfig) RegisterFlags(f *flag.FlagSet) {
 	f.IntVar(&cfg.CardinalityCacheSize, "store.cardinality-cache-size", 0, "Size of in-memory cardinality cache, 0 to disable.")
 	f.DurationVar(&cfg.CardinalityCacheValidity, "store.cardinality-cache-validity", 1*time.Hour, "Period for which entries in the cardinality cache are valid.")
 	f.IntVar(&cfg.CardinalityLimit, "store.cardinality-limit", 1e5, "Cardinality limit for index queries.")
+
+	f.IntVar(&cfg.IndexEntryCacheSize, "store.index-entry-cache", 0, "The number of index entries to cache so we don't write duplicates.")


No need to the extra line here & above.

s/The number of index entries to cache so we don't write duplicates./Size of index entry cache used to deduplicate writes./

tomwilkie · 2018-09-23T11:05:47Z

pkg/chunk/chunk_store.go

 		key := fmt.Sprintf("%s:%s:%x", entry.TableName, entry.HashValue, entry.RangeValue)
 		if _, ok := seenIndexEntries[key]; !ok {
 			seenIndexEntries[key] = struct{}{}
 			rowWrites.Observe(entry.HashValue, 1)
 			result.Add(entry.TableName, entry.HashValue, entry.RangeValue, entry.Value)
 		}
 	}
+	c.entryCache.Store(context.Background(), cacheKeys, make([][]byte, len(cacheKeys)))


I'm a little worries that if the write fails we won't try again as you've added it to the cache too early.

This will lose data, at least on DynamoDB where the whole write can fail and be retried.

tomwilkie · 2018-09-23T11:07:23Z

pkg/chunk/chunk_store.go

+	}
+
+	return keys, keyMap
+}


I'd be tempted to inline this in dedupeEntriesFromCache as its only used there.

tomwilkie · 2018-09-23T11:09:28Z

First round of review; we should also make this use the tiered cache so dedupes work across ingester restarts.

And some tests please.

bboreham · 2018-09-26T09:29:34Z

pkg/chunk/chunk_store.go

+	keys := make([]string, 0, len(entries))
+	keyMap := make(map[string]IndexEntry, len(entries))
+	for _, entry := range entries {
+		key := strings.Join([]string{


This is basically repeating the Sprintf("%s:%s:%x", from earlier in a different way. Be more DRY.

bboreham · 2018-09-27T07:54:37Z

What sort of memory growth do you see from this?
It would be way more memory-efficient to put a flag on the existing series structure saying "series metadata saved".

bboreham · 2018-09-27T07:59:54Z

a flag

per bucket 😞

bboreham · 2018-09-27T08:07:08Z

There are heuristics which can improve matters. E.g.

A chunk which has idled out, you're not expecting to write any more in that series, so don't cache it.
A chunk which aged out at 12 hours, you don't expect to write more than 1 more of those in the same bucket, so don't cache it.

That has to cover a significant percentage.

gouthamve · 2018-09-27T09:22:53Z

Hmm, how much improvement can these heuristics bring? We'll need to store about 15 x numSeries in memcache which should be quite small.

gouthamve · 2018-09-27T09:39:40Z

So I pushed 2 commits which move flags for making a tiered cache to single place instead of having custom ones everywhere, but I'm not a fan of the change. This breaks some flags and adds additional flags where not needed.

/cc @tomwilkie

bboreham · 2018-09-27T10:01:50Z

15 x numSeries in memcache which should be quite small.

Seriously? What's your calculation for size of each item?
We run ingesters with 2 million series each.

bboreham · 2018-09-27T10:57:51Z

Here's my calculation:
Code for the key is strings.Join([]string{entry.TableName, entry.HashValue, string(entry.RangeValue), string(entry.Value)}, string('\xff'))
Consider just label->series entries, which are the most numerous:
TableName say 12 bytes
HashValue is user:day:metricName:labelName, say 5+7+30+20 = 62
RangeValue is 32+32+separators = 66
Value is the label value, say 30 bytes on average?

So that's 173 bytes, plus cache overhead:
cacheEntry is 72 bytes.
Go map overhead is at least 24 bytes for the string and int, call it 30.
Grand total = 275 bytes per entry.

Times 30 million entries is 8 GB, plus Go heap expansion is 16 GB extra RAM I need per ingester.
Please check my calculation; these things are notoriously difficult to get right.

Having thought about it some more, it would be better to have the cache know that all entries for a (v9) series go together, so we remove the label name and value from the key and reduce the entries 15x.

gouthamve · 2018-09-27T10:59:44Z

Having thought about it some more, it would be better to have the cache know that all entries for a (v9) series go together, so we remove the label name and value from the key and reduce the entries 15x.

Oh Yes! This is indeed much, much better!

gouthamve · 2018-09-28T09:57:40Z

This is now ready for review. I agree that in-mem might be too much, but throwing everything into memcache will help.

Further, my calculations above are wrong, we cut into the next row every day, hence the max reduction of writes is atmost 50% assuming there are 2 chunks per row.

Finally, I've made it so that we have now have a tiered (in-mem --> disk --> memcache) for everything. This makes it extremely confusing because the descriptions for all are same. Don't have good ideas on how to fix it.

tomwilkie · 2018-09-29T13:09:15Z

I'm seeing some flakiness in the test:

$ go test ./pkg/chunk/
--- FAIL: TestIndexCachingWorks (0.00s)
        Error Trace:    chunk_store_test.go:559
        Error:		Not equal: 4 (expected)
        	        != 5 (actual)
        
FAIL
FAIL	github.com/weaveworks/cortex/pkg/chunk	0.569s

tomwilkie · 2018-09-29T13:27:25Z

Reliably flakes for me:

bboreham · 2018-10-01T07:52:07Z

RangeValue is 32+32+separators = 66

Those 32s are base64-encoded, so actually ~43

I don’t want this merged without sorting the question of memory usage. What are you seeing in trials?

tomwilkie · 2018-10-01T09:09:14Z

pkg/chunk/chunk_store.go

+		return entries, nil
+	}
+
+	found, missing = c.entryCache.Fetch(context.Background(), entries)


Please propagate the context so the traces work properly.

tomwilkie · 2018-10-01T09:09:24Z

pkg/chunk/chunk_store.go

+		return
+	}
+
+	c.entryCache.Store(context.Background(), entries)


Propagate the context.

tomwilkie · 2018-10-01T09:11:10Z

pkg/chunk/chunk_store.go

+
+	for _, entry := range entries {
+		key := dedupeKey(entry)
+		out, err := proto.Marshal(&entry)


Why do we write the index entry to the cache? Isn't the key enough to check equality?

We moved to memcache so we now need to hash the keys as they might contain non utf-8 chars which doesn't work with memcache.

Now that we have hashes, we need to store the actual entry to dedupe.

tomwilkie · 2018-10-01T09:13:43Z

pkg/chunk/series_store.go

@@ -58,17 +58,23 @@ type seriesStore struct {
 func newSeriesStore(cfg StoreConfig, schema Schema, storage StorageClient) (Store, error) {
 	fetcher, err := NewChunkFetcher(cfg.CacheConfig, storage)
 	if err != nil {
-		return nil, err
+		return nil, errors.Wrap(err, "create chunk fetcher")


I generally put error.Wrap at the very leaf of where the error is returns, in our code. Adding on half way down the stack isn't that useful.

Hmm, I disagree. Usually adding it here let's us track if it's from the chunk fetcher or from cache creation (below). Why do you think it isn't that useful?

Clarified what I meant f2f.

tomwilkie · 2018-10-01T09:14:27Z

pkg/chunk/storage/factory.go

-	f.IntVar(&cfg.IndexCacheSize, "store.index-cache-size", 0, "Size of in-memory index cache, 0 to disable.")
-	f.DurationVar(&cfg.IndexCacheValidity, "store.index-cache-validity", 5*time.Minute, "Period for which entries in the index cache are valid. Should be no higher than -ingester.max-chunk-idle.")
-	cfg.memcacheClient.RegisterFlagsWithPrefix("index", f)
+	cfg.indexCache.RegisterFlagsWithPrefix("store.index-cache-read", f)


Doesn't this change the flags? Won't think break backwards compatibility?

Yes, it does. But I couldn't come up with a descriptive one that doesn't confuse users of the two caches now.

We can't break backward compat like this. At least leave the old flags in with a deprecated notice.

tomwilkie · 2018-10-01T09:56:20Z

@bboreham you're memory consumption calculation look about right to me. Whats more, there is very little point in storing this in process, as another 2x of the writes come from other ingesters. Therefore we shouldn't use the FIFO cache, and only memcached for this.

Considering that, I make it:

each series has 3 chunks (across 3 ingesters) = 3k
an index entry for the write cache is 2-300 bytes
we need to store at least 10 label index entries per series in the (external) cache = 3k

Which would double the memory usage of cortex on the write path. I'm inclined to believe this is too much too.

I chatted with @gouthamve, and believe that moving this caching to the write path in the series store would allow us to only cache (userid, day, series ID) and use that to avoid writing out the label index, without too much of a layering violation. This would result in some code duplication between the chunk store and series store, as their write paths are currently shared, but I think that will be fine.

WDYT?

bboreham · 2018-10-01T11:14:02Z

Sure; get something that caches just once per series then we can see if the layering/duplication can be improved.

gouthamve · 2018-10-01T13:16:56Z

I've got something working here. Will deploy to dev and let you know.

pkg/chunk/schema.go

pkg/chunk/series_store.go

tomwilkie · 2018-10-01T19:09:08Z

pkg/chunk/series_store.go

+	}
+
+	bufs := make([][]byte, len(keysToCache))
+	c.entryCache.Store(context.Background(), keysToCache, bufs)


This call is too early again, no?

gouthamve · 2018-10-02T08:56:49Z

How the prefixed flags look now:

  -store.index-cache-read.cache.default-validity duration
    	Cache config for index entry reading. The default validity of entries for caches unless overridden.
  -store.index-cache-read.cache.enable-diskcache
    	Cache config for index entry reading. Enable on-disk cache.
  -store.index-cache-read.cache.enable-fifocache
    	Cache config for index entry reading. Enable in-memory cache.
  -store.index-cache-read.diskcache.path string
    	Cache config for index entry reading. Path to file used to cache chunks. (default "/var/run/chunks")
  -store.index-cache-read.diskcache.size int
    	Cache config for index entry reading. Size of file (bytes) (default 1073741824)
  -store.index-cache-read.fifocache.duration duration
    	Cache config for index entry reading. The expiry duration for the cache.
  -store.index-cache-read.fifocache.size int
    	Cache config for index entry reading. The number of entries to cache.
  -store.index-cache-read.memcache.write-back-buffer int
    	Cache config for index entry reading. How many chunks to buffer for background write back. (default 10000)
  -store.index-cache-read.memcache.write-back-goroutines int
    	Cache config for index entry reading. How many goroutines to use to write back to memcache. (default 10)
  -store.index-cache-read.memcached.batchsize int
    	Cache config for index entry reading. How many keys to fetch in each batch.
  -store.index-cache-read.memcached.expiration duration
    	Cache config for index entry reading. How long keys stay in the memcache.
  -store.index-cache-read.memcached.hostname string
    	Cache config for index entry reading. Hostname for memcached service to use when caching chunks. If empty, no memcached will be used.
  -store.index-cache-read.memcached.parallelism int
    	Cache config for index entry reading. Maximum active requests to memcache. (default 100)
  -store.index-cache-read.memcached.service string
    	Cache config for index entry reading. SRV service used to discover memcache servers. (default "memcached")
  -store.index-cache-read.memcached.timeout duration
    	Cache config for index entry reading. Maximum time to wait before giving up on memcached requests. (default 100ms)
  -store.index-cache-read.memcached.update-interval duration
    	Cache config for index entry reading. Period with which to poll DNS for memcache servers. (default 1m0s)
  -store.index-cache-size int
    	Deprecated: Use -store.index-cache-read.*; Size of in-memory index cache, 0 to disable.
  -store.index-cache-validity duration
    	Deprecated: Use -store.index-cache-read.*; Period for which entries in the index cache are valid. Should be no higher than -ingester.max-chunk-idle. (default 5m0s)
  -store.index-cache-write.cache.default-validity duration
    	Cache config for index entry writing. The default validity of entries for caches unless overridden.
  -store.index-cache-write.cache.enable-diskcache
    	Cache config for index entry writing. Enable on-disk cache.
  -store.index-cache-write.cache.enable-fifocache
    	Cache config for index entry writing. Enable in-memory cache.
  -store.index-cache-write.diskcache.path string
    	Cache config for index entry writing. Path to file used to cache chunks. (default "/var/run/chunks")
  -store.index-cache-write.diskcache.size int
    	Cache config for index entry writing. Size of file (bytes) (default 1073741824)
  -store.index-cache-write.fifocache.duration duration
    	Cache config for index entry writing. The expiry duration for the cache.
  -store.index-cache-write.fifocache.size int
    	Cache config for index entry writing. The number of entries to cache.
  -store.index-cache-write.memcache.write-back-buffer int
    	Cache config for index entry writing. How many chunks to buffer for background write back. (default 10000)
  -store.index-cache-write.memcache.write-back-goroutines int
    	Cache config for index entry writing. How many goroutines to use to write back to memcache. (default 10)
  -store.index-cache-write.memcached.batchsize int
    	Cache config for index entry writing. How many keys to fetch in each batch.
  -store.index-cache-write.memcached.expiration duration
    	Cache config for index entry writing. How long keys stay in the memcache.
  -store.index-cache-write.memcached.hostname string
    	Cache config for index entry writing. Hostname for memcached service to use when caching chunks. If empty, no memcached will be used.
  -store.index-cache-write.memcached.parallelism int
    	Cache config for index entry writing. Maximum active requests to memcache. (default 100)
  -store.index-cache-write.memcached.service string
    	Cache config for index entry writing. SRV service used to discover memcache servers. (default "memcached")
  -store.index-cache-write.memcached.timeout duration
    	Cache config for index entry writing. Maximum time to wait before giving up on memcached requests. (default 100ms)
  -store.index-cache-write.memcached.update-interval duration
    	Cache config for index entry writing. Period with which to poll DNS for memcache servers. (default 1m0s)

tomwilkie · 2018-10-02T11:27:05Z

@gouthamve give this a rebase into 2 changes: one that updates the flags and caches, and one that adds the write caching. Then I'll give it what I hope if a final review.

We're writing the series label index to bigtable for every chunk We now cache the series-id and write only if we didn't write it before Signed-off-by: Goutham Veeramachaneni <[email protected]> -------- This is a squashed commit but only including Tom's commits' description for attribution. Review feedback. Signed-off-by: Tom Wilkie <[email protected]> Write back cache keys after they have be written to store. Signed-off-by: Tom Wilkie <[email protected]>

Signed-off-by: Goutham Veeramachaneni <[email protected]>

gouthamve · 2018-10-03T02:47:41Z

Split it into 2 commits @tomwilkie

tomwilkie

Been super detailed and nit picky - sorry! Generally looking really good.

tomwilkie · 2018-10-03T09:59:12Z

pkg/chunk/cache/cache.go


 	// For tests to inject specific implementations.
 	Cache Cache
 }

 // RegisterFlags adds the flags required to config this to the given FlagSet.
 func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
-	f.BoolVar(&cfg.EnableDiskcache, "cache.enable-diskcache", false, "Enable on-disk cache")
+	cfg.RegisterFlagsWithPrefix("", "", f)
+}


RegisterFlags only used in two places now: frontend and chunk store. Can we remove the function and call RegisterFlagsWithPrefix from there?

tomwilkie · 2018-10-03T09:59:43Z

pkg/chunk/cache/background.go

-	f.IntVar(&cfg.WriteBackGoroutines, "memcache.write-back-goroutines", 10, "How many goroutines to use to write back to memcache.")
-	f.IntVar(&cfg.WriteBackBuffer, "memcache.write-back-buffer", 10000, "How many chunks to buffer for background write back.")
+	cfg.RegisterFlagsWithPrefix("", "", f)
+}


I don't see this function being used anymore - can we remove it?

pkg/chunk/cache/cache.go

tomwilkie · 2018-10-03T10:00:47Z

pkg/chunk/cache/cache.go

+
+	if prefix != "" {
+		prefix += "."
+	}


This pattern is repeated quite a lot - perhaps we could do it here once?

tomwilkie · 2018-10-03T10:01:15Z

pkg/chunk/cache/cache.go

+		prefix := ""
+		if cfg.prefix != "" {
+			prefix = cfg.prefix
+		}


Why is this needed? Can we use cfg.prefix below?

tomwilkie · 2018-10-03T10:20:36Z

pkg/chunk/storage/caching_fixtures.go

@@ -17,7 +17,7 @@ type fixture struct {
 func (f fixture) Name() string { return "caching-store" }
 func (f fixture) Clients() (chunk.StorageClient, chunk.TableClient, chunk.SchemaConfig, error) {
 	storageClient, tableClient, schemaConfig, err := f.fixture.Clients()
-	client := newCachingStorageClient(storageClient, cache.NewFifoCache("index-fifo", 500, 5*time.Minute), 5*time.Minute)
+	client := newCachingStorageClient(storageClient, cache.NewFifoCache("index-fifo", cache.FifoCacheConfig{500, 5 * time.Minute}), 5*time.Minute)


Brittle use of FifoCacheConfig, use util.DefaultValues.

tomwilkie · 2018-10-03T10:20:45Z

pkg/chunk/storage/caching_storage_client_test.go

@@ -34,7 +34,7 @@ func TestCachingStorageClientBasic(t *testing.T) {
 			}},
 		},
 	}
-	cache := cache.NewFifoCache("test", 10, 10*time.Second)
+	cache := cache.NewFifoCache("test", cache.FifoCacheConfig{10, 10 * time.Second})


Brittle use of FifoCacheConfig, use util.DefaultValues.

tomwilkie · 2018-10-03T10:20:49Z

pkg/chunk/storage/caching_storage_client_test.go

@@ -63,7 +63,7 @@ func TestCachingStorageClient(t *testing.T) {
 			}},
 		},
 	}
-	cache := cache.NewFifoCache("test", 10, 10*time.Second)
+	cache := cache.NewFifoCache("test", cache.FifoCacheConfig{10, 10 * time.Second})


Brittle use of FifoCacheConfig, use util.DefaultValues.

tomwilkie · 2018-10-03T10:20:53Z

pkg/chunk/storage/caching_storage_client_test.go

@@ -113,7 +113,7 @@ func TestCachingStorageClientCollision(t *testing.T) {
 			},
 		},
 	}
-	cache := cache.NewFifoCache("test", 10, 10*time.Second)
+	cache := cache.NewFifoCache("test", cache.FifoCacheConfig{10, 10 * time.Second})


Brittle use of FifoCacheConfig, use util.DefaultValues.

tomwilkie · 2018-10-03T10:22:17Z

pkg/chunk/storage/factory.go

+	var tieredCache cache.Cache
+	var err error
+
+	// Building up from deprecated flags.
 	var caches []cache.Cache
 	if cfg.IndexCacheSize > 0 {


This seems to replication the logic in cache.NewCache - shouldn't we be using that?

It complicates things by making it harder to detect if the deprecated flags were used or not. cache.New always returns a non null cache, even with empty config.

Though I should prolly fix that.

tomwilkie · 2018-10-03T10:23:43Z

Oh, last thing - as we hash the write keys, we need to write the full key to the cache to and check for collisions.

gouthamve · 2018-10-03T11:52:35Z

pkg/chunk/storage/factory.go

-			opts[i].Client = newCachingStorageClient(opts[i].Client, tieredCache, cfg.IndexCacheValidity)
-		}
+	for i := range opts {
+		opts[i].Client = newCachingStorageClient(opts[i].Client, tieredCache, cfg.indexCache.DefaultValidity)


Note to self: cfg.indexCache.DefaultValidity should be set to cfg.IndexCacheValidity for backwards compat.

Signed-off-by: Goutham Veeramachaneni <[email protected]>

gouthamve · 2018-10-04T07:26:58Z

So, I've fixed everything except using util.DefaultValues for FifoCacheConfig. We use FifoCacheConfig inside cache.Config and the default values for it disable it. To reduce brittleness, I've keyed the struct fields everywhere.

Also, we're not hashing the keys, rather encoding them. I've noticed that the key length is <50% max consistently and I don't see a way for it to blow up. I've added a comment regarding the same.

bboreham · 2018-10-18T18:27:59Z

This breaks some flags

Could you say that more loudly next time? E.g. in the PR title, in the Slack channel, in the PR description at the top.

tomwilkie reviewed Sep 23, 2018

View reviewed changes

tomwilkie force-pushed the cache-index-writes branch from 447df20 to 3139b89 Compare September 24, 2018 15:00

bboreham reviewed Sep 26, 2018

View reviewed changes

gouthamve force-pushed the cache-index-writes branch from 03709c7 to ea5c6fb Compare September 28, 2018 05:48

gouthamve mentioned this pull request Sep 28, 2018

Add index batching for memcache #1008

Closed

gouthamve force-pushed the cache-index-writes branch from 5e56dcc to dd69601 Compare October 1, 2018 06:51