Skip to content

Cache index writes (and change some flag names) #1024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 17, 2018

Conversation

gouthamve
Copy link
Contributor

@gouthamve gouthamve commented Sep 21, 2018

Depends on #1011 (The cache interface is changing there)

Fixes #957


Now, we've seen that we write an average of 11 index entries per chunk. In the v9 schema, 10 of those entries are series dependant while one is the series-id ---> chunkID mapping. Essentially we're doing 10x repeated writes!

Now, this PR let's you cache and dedupe the entries letting you reduce the write load on the database by 10x. But you need to make sure the cache size is 11 x numSeries (depends on your setup), else you'll end up evicting the entries before the series can hit them.

Still needs to be tested.

@@ -67,6 +79,8 @@ func (cfg *StoreConfig) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&cfg.CardinalityCacheSize, "store.cardinality-cache-size", 0, "Size of in-memory cardinality cache, 0 to disable.")
f.DurationVar(&cfg.CardinalityCacheValidity, "store.cardinality-cache-validity", 1*time.Hour, "Period for which entries in the cardinality cache are valid.")
f.IntVar(&cfg.CardinalityLimit, "store.cardinality-limit", 1e5, "Cardinality limit for index queries.")

f.IntVar(&cfg.IndexEntryCacheSize, "store.index-entry-cache", 0, "The number of index entries to cache so we don't write duplicates.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to the extra line here & above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/The number of index entries to cache so we don't write duplicates./Size of index entry cache used to deduplicate writes./

key := fmt.Sprintf("%s:%s:%x", entry.TableName, entry.HashValue, entry.RangeValue)
if _, ok := seenIndexEntries[key]; !ok {
seenIndexEntries[key] = struct{}{}
rowWrites.Observe(entry.HashValue, 1)
result.Add(entry.TableName, entry.HashValue, entry.RangeValue, entry.Value)
}
}
c.entryCache.Store(context.Background(), cacheKeys, make([][]byte, len(cacheKeys)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worries that if the write fails we won't try again as you've added it to the cache too early.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lose data, at least on DynamoDB where the whole write can fail and be retried.

}

return keys, keyMap
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to inline this in dedupeEntriesFromCache as its only used there.

@tomwilkie
Copy link
Contributor

First round of review; we should also make this use the tiered cache so dedupes work across ingester restarts.

And some tests please.

keys := make([]string, 0, len(entries))
keyMap := make(map[string]IndexEntry, len(entries))
for _, entry := range entries {
key := strings.Join([]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically repeating the Sprintf("%s:%s:%x", from earlier in a different way. Be more DRY.

@bboreham
Copy link
Contributor

What sort of memory growth do you see from this?
It would be way more memory-efficient to put a flag on the existing series structure saying "series metadata saved".

@bboreham
Copy link
Contributor

a flag

per bucket 😞

@bboreham
Copy link
Contributor

There are heuristics which can improve matters. E.g.

  • A chunk which has idled out, you're not expecting to write any more in that series, so don't cache it.
  • A chunk which aged out at 12 hours, you don't expect to write more than 1 more of those in the same bucket, so don't cache it.

That has to cover a significant percentage.

@gouthamve
Copy link
Contributor Author

Hmm, how much improvement can these heuristics bring? We'll need to store about 15 x numSeries in memcache which should be quite small.

@gouthamve
Copy link
Contributor Author

So I pushed 2 commits which move flags for making a tiered cache to single place instead of having custom ones everywhere, but I'm not a fan of the change. This breaks some flags and adds additional flags where not needed.

/cc @tomwilkie

@bboreham
Copy link
Contributor

15 x numSeries in memcache which should be quite small.

Seriously? What's your calculation for size of each item?
We run ingesters with 2 million series each.

@bboreham
Copy link
Contributor

Here's my calculation:
Code for the key is strings.Join([]string{entry.TableName, entry.HashValue, string(entry.RangeValue), string(entry.Value)}, string('\xff'))
Consider just label->series entries, which are the most numerous:
TableName say 12 bytes
HashValue is user:day:metricName:labelName, say 5+7+30+20 = 62
RangeValue is 32+32+separators = 66
Value is the label value, say 30 bytes on average?

So that's 173 bytes, plus cache overhead:
cacheEntry is 72 bytes.
Go map overhead is at least 24 bytes for the string and int, call it 30.
Grand total = 275 bytes per entry.

Times 30 million entries is 8 GB, plus Go heap expansion is 16 GB extra RAM I need per ingester.
Please check my calculation; these things are notoriously difficult to get right.

Having thought about it some more, it would be better to have the cache know that all entries for a (v9) series go together, so we remove the label name and value from the key and reduce the entries 15x.

@gouthamve
Copy link
Contributor Author

Having thought about it some more, it would be better to have the cache know that all entries for a (v9) series go together, so we remove the label name and value from the key and reduce the entries 15x.

Oh Yes! This is indeed much, much better!

@gouthamve
Copy link
Contributor Author

This is now ready for review. I agree that in-mem might be too much, but throwing everything into memcache will help.

Further, my calculations above are wrong, we cut into the next row every day, hence the max reduction of writes is atmost 50% assuming there are 2 chunks per row.

Finally, I've made it so that we have now have a tiered (in-mem --> disk --> memcache) for everything. This makes it extremely confusing because the descriptions for all are same. Don't have good ideas on how to fix it.

@tomwilkie
Copy link
Contributor

I'm seeing some flakiness in the test:

$ go test ./pkg/chunk/
--- FAIL: TestIndexCachingWorks (0.00s)
        Error Trace:    chunk_store_test.go:559
        Error:		Not equal: 4 (expected)
        	        != 5 (actual)
        
FAIL
FAIL	github.com/weaveworks/cortex/pkg/chunk	0.569s

@tomwilkie
Copy link
Contributor

@bboreham
Copy link
Contributor

bboreham commented Oct 1, 2018

RangeValue is 32+32+separators = 66

Those 32s are base64-encoded, so actually ~43

I don’t want this merged without sorting the question of memory usage. What are you seeing in trials?

return entries, nil
}

found, missing = c.entryCache.Fetch(context.Background(), entries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please propagate the context so the traces work properly.

return
}

c.entryCache.Store(context.Background(), entries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagate the context.


for _, entry := range entries {
key := dedupeKey(entry)
out, err := proto.Marshal(&entry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we write the index entry to the cache? Isn't the key enough to check equality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved to memcache so we now need to hash the keys as they might contain non utf-8 chars which doesn't work with memcache.

Now that we have hashes, we need to store the actual entry to dedupe.

@@ -58,17 +58,23 @@ type seriesStore struct {
func newSeriesStore(cfg StoreConfig, schema Schema, storage StorageClient) (Store, error) {
fetcher, err := NewChunkFetcher(cfg.CacheConfig, storage)
if err != nil {
return nil, err
return nil, errors.Wrap(err, "create chunk fetcher")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally put error.Wrap at the very leaf of where the error is returns, in our code. Adding on half way down the stack isn't that useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I disagree. Usually adding it here let's us track if it's from the chunk fetcher or from cache creation (below). Why do you think it isn't that useful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified what I meant f2f.

f.IntVar(&cfg.IndexCacheSize, "store.index-cache-size", 0, "Size of in-memory index cache, 0 to disable.")
f.DurationVar(&cfg.IndexCacheValidity, "store.index-cache-validity", 5*time.Minute, "Period for which entries in the index cache are valid. Should be no higher than -ingester.max-chunk-idle.")
cfg.memcacheClient.RegisterFlagsWithPrefix("index", f)
cfg.indexCache.RegisterFlagsWithPrefix("store.index-cache-read", f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this change the flags? Won't think break backwards compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. But I couldn't come up with a descriptive one that doesn't confuse users of the two caches now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't break backward compat like this. At least leave the old flags in with a deprecated notice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tomwilkie
Copy link
Contributor

@bboreham you're memory consumption calculation look about right to me. Whats more, there is very little point in storing this in process, as another 2x of the writes come from other ingesters. Therefore we shouldn't use the FIFO cache, and only memcached for this.

Considering that, I make it:

  • each series has 3 chunks (across 3 ingesters) = 3k
  • an index entry for the write cache is 2-300 bytes
  • we need to store at least 10 label index entries per series in the (external) cache = 3k

Which would double the memory usage of cortex on the write path. I'm inclined to believe this is too much too.

I chatted with @gouthamve, and believe that moving this caching to the write path in the series store would allow us to only cache (userid, day, series ID) and use that to avoid writing out the label index, without too much of a layering violation. This would result in some code duplication between the chunk store and series store, as their write paths are currently shared, but I think that will be fine.

WDYT?

@bboreham
Copy link
Contributor

bboreham commented Oct 1, 2018

Sure; get something that caches just once per series then we can see if the layering/duplication can be improved.

@gouthamve
Copy link
Contributor Author

I've got something working here. Will deploy to dev and let you know.

}

bufs := make([][]byte, len(keysToCache))
c.entryCache.Store(context.Background(), keysToCache, bufs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is too early again, no?

@gouthamve
Copy link
Contributor Author

How the prefixed flags look now:

  -store.index-cache-read.cache.default-validity duration
    	Cache config for index entry reading. The default validity of entries for caches unless overridden.
  -store.index-cache-read.cache.enable-diskcache
    	Cache config for index entry reading. Enable on-disk cache.
  -store.index-cache-read.cache.enable-fifocache
    	Cache config for index entry reading. Enable in-memory cache.
  -store.index-cache-read.diskcache.path string
    	Cache config for index entry reading. Path to file used to cache chunks. (default "/var/run/chunks")
  -store.index-cache-read.diskcache.size int
    	Cache config for index entry reading. Size of file (bytes) (default 1073741824)
  -store.index-cache-read.fifocache.duration duration
    	Cache config for index entry reading. The expiry duration for the cache.
  -store.index-cache-read.fifocache.size int
    	Cache config for index entry reading. The number of entries to cache.
  -store.index-cache-read.memcache.write-back-buffer int
    	Cache config for index entry reading. How many chunks to buffer for background write back. (default 10000)
  -store.index-cache-read.memcache.write-back-goroutines int
    	Cache config for index entry reading. How many goroutines to use to write back to memcache. (default 10)
  -store.index-cache-read.memcached.batchsize int
    	Cache config for index entry reading. How many keys to fetch in each batch.
  -store.index-cache-read.memcached.expiration duration
    	Cache config for index entry reading. How long keys stay in the memcache.
  -store.index-cache-read.memcached.hostname string
    	Cache config for index entry reading. Hostname for memcached service to use when caching chunks. If empty, no memcached will be used.
  -store.index-cache-read.memcached.parallelism int
    	Cache config for index entry reading. Maximum active requests to memcache. (default 100)
  -store.index-cache-read.memcached.service string
    	Cache config for index entry reading. SRV service used to discover memcache servers. (default "memcached")
  -store.index-cache-read.memcached.timeout duration
    	Cache config for index entry reading. Maximum time to wait before giving up on memcached requests. (default 100ms)
  -store.index-cache-read.memcached.update-interval duration
    	Cache config for index entry reading. Period with which to poll DNS for memcache servers. (default 1m0s)
  -store.index-cache-size int
    	Deprecated: Use -store.index-cache-read.*; Size of in-memory index cache, 0 to disable.
  -store.index-cache-validity duration
    	Deprecated: Use -store.index-cache-read.*; Period for which entries in the index cache are valid. Should be no higher than -ingester.max-chunk-idle. (default 5m0s)
  -store.index-cache-write.cache.default-validity duration
    	Cache config for index entry writing. The default validity of entries for caches unless overridden.
  -store.index-cache-write.cache.enable-diskcache
    	Cache config for index entry writing. Enable on-disk cache.
  -store.index-cache-write.cache.enable-fifocache
    	Cache config for index entry writing. Enable in-memory cache.
  -store.index-cache-write.diskcache.path string
    	Cache config for index entry writing. Path to file used to cache chunks. (default "/var/run/chunks")
  -store.index-cache-write.diskcache.size int
    	Cache config for index entry writing. Size of file (bytes) (default 1073741824)
  -store.index-cache-write.fifocache.duration duration
    	Cache config for index entry writing. The expiry duration for the cache.
  -store.index-cache-write.fifocache.size int
    	Cache config for index entry writing. The number of entries to cache.
  -store.index-cache-write.memcache.write-back-buffer int
    	Cache config for index entry writing. How many chunks to buffer for background write back. (default 10000)
  -store.index-cache-write.memcache.write-back-goroutines int
    	Cache config for index entry writing. How many goroutines to use to write back to memcache. (default 10)
  -store.index-cache-write.memcached.batchsize int
    	Cache config for index entry writing. How many keys to fetch in each batch.
  -store.index-cache-write.memcached.expiration duration
    	Cache config for index entry writing. How long keys stay in the memcache.
  -store.index-cache-write.memcached.hostname string
    	Cache config for index entry writing. Hostname for memcached service to use when caching chunks. If empty, no memcached will be used.
  -store.index-cache-write.memcached.parallelism int
    	Cache config for index entry writing. Maximum active requests to memcache. (default 100)
  -store.index-cache-write.memcached.service string
    	Cache config for index entry writing. SRV service used to discover memcache servers. (default "memcached")
  -store.index-cache-write.memcached.timeout duration
    	Cache config for index entry writing. Maximum time to wait before giving up on memcached requests. (default 100ms)
  -store.index-cache-write.memcached.update-interval duration
    	Cache config for index entry writing. Period with which to poll DNS for memcache servers. (default 1m0s)

@tomwilkie
Copy link
Contributor

@gouthamve give this a rebase into 2 changes: one that updates the flags and caches, and one that adds the write caching. Then I'll give it what I hope if a final review.

We're writing the series label index to bigtable for every chunk
We now cache the series-id and write only if we didn't write it before

Signed-off-by: Goutham Veeramachaneni <[email protected]>

--------

This is a squashed commit but only including Tom's commits'
description for attribution.

Review feedback.

Signed-off-by: Tom Wilkie <[email protected]>

Write back cache keys after they have be written to store.

Signed-off-by: Tom Wilkie <[email protected]>
@gouthamve gouthamve force-pushed the cache-index-writes branch 2 times, most recently from bf23019 to 6c37c87 Compare October 3, 2018 02:44
@gouthamve
Copy link
Contributor Author

Split it into 2 commits @tomwilkie

Copy link
Contributor

@tomwilkie tomwilkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been super detailed and nit picky - sorry! Generally looking really good.


// For tests to inject specific implementations.
Cache Cache
}

// RegisterFlags adds the flags required to config this to the given FlagSet.
func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.BoolVar(&cfg.EnableDiskcache, "cache.enable-diskcache", false, "Enable on-disk cache")
cfg.RegisterFlagsWithPrefix("", "", f)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RegisterFlags only used in two places now: frontend and chunk store. Can we remove the function and call RegisterFlagsWithPrefix from there?

f.IntVar(&cfg.WriteBackGoroutines, "memcache.write-back-goroutines", 10, "How many goroutines to use to write back to memcache.")
f.IntVar(&cfg.WriteBackBuffer, "memcache.write-back-buffer", 10000, "How many chunks to buffer for background write back.")
cfg.RegisterFlagsWithPrefix("", "", f)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this function being used anymore - can we remove it?


if prefix != "" {
prefix += "."
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is repeated quite a lot - perhaps we could do it here once?

prefix := ""
if cfg.prefix != "" {
prefix = cfg.prefix
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Can we use cfg.prefix below?

@@ -17,7 +17,7 @@ type fixture struct {
func (f fixture) Name() string { return "caching-store" }
func (f fixture) Clients() (chunk.StorageClient, chunk.TableClient, chunk.SchemaConfig, error) {
storageClient, tableClient, schemaConfig, err := f.fixture.Clients()
client := newCachingStorageClient(storageClient, cache.NewFifoCache("index-fifo", 500, 5*time.Minute), 5*time.Minute)
client := newCachingStorageClient(storageClient, cache.NewFifoCache("index-fifo", cache.FifoCacheConfig{500, 5 * time.Minute}), 5*time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brittle use of FifoCacheConfig, use util.DefaultValues.

@@ -34,7 +34,7 @@ func TestCachingStorageClientBasic(t *testing.T) {
}},
},
}
cache := cache.NewFifoCache("test", 10, 10*time.Second)
cache := cache.NewFifoCache("test", cache.FifoCacheConfig{10, 10 * time.Second})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brittle use of FifoCacheConfig, use util.DefaultValues.

@@ -63,7 +63,7 @@ func TestCachingStorageClient(t *testing.T) {
}},
},
}
cache := cache.NewFifoCache("test", 10, 10*time.Second)
cache := cache.NewFifoCache("test", cache.FifoCacheConfig{10, 10 * time.Second})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brittle use of FifoCacheConfig, use util.DefaultValues.

@@ -113,7 +113,7 @@ func TestCachingStorageClientCollision(t *testing.T) {
},
},
}
cache := cache.NewFifoCache("test", 10, 10*time.Second)
cache := cache.NewFifoCache("test", cache.FifoCacheConfig{10, 10 * time.Second})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brittle use of FifoCacheConfig, use util.DefaultValues.

var tieredCache cache.Cache
var err error

// Building up from deprecated flags.
var caches []cache.Cache
if cfg.IndexCacheSize > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to replication the logic in cache.NewCache - shouldn't we be using that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It complicates things by making it harder to detect if the deprecated flags were used or not. cache.New always returns a non null cache, even with empty config.

Though I should prolly fix that.

@tomwilkie
Copy link
Contributor

Oh, last thing - as we hash the write keys, we need to write the full key to the cache to and check for collisions.

opts[i].Client = newCachingStorageClient(opts[i].Client, tieredCache, cfg.IndexCacheValidity)
}
for i := range opts {
opts[i].Client = newCachingStorageClient(opts[i].Client, tieredCache, cfg.indexCache.DefaultValidity)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: cfg.indexCache.DefaultValidity should be set to cfg.IndexCacheValidity for backwards compat.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
@gouthamve
Copy link
Contributor Author

So, I've fixed everything except using util.DefaultValues for FifoCacheConfig. We use FifoCacheConfig inside cache.Config and the default values for it disable it. To reduce brittleness, I've keyed the struct fields everywhere.

Also, we're not hashing the keys, rather encoding them. I've noticed that the key length is <50% max consistently and I don't see a way for it to blow up. I've added a comment regarding the same.

@tomwilkie tomwilkie merged commit 393fae6 into cortexproject:master Oct 17, 2018
@tomwilkie tomwilkie deleted the cache-index-writes branch October 17, 2018 16:47
@bboreham
Copy link
Contributor

This breaks some flags

Could you say that more loudly next time? E.g. in the PR title, in the Slack channel, in the PR description at the top.

@bboreham bboreham changed the title Cache index writes Cache index writes (and change some flag names) Oct 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants