Skip to content

Cache index queries for up to 15mins #947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 28, 2018

Conversation

tomwilkie
Copy link
Contributor

An in-process cache for index queries to accelerate queries.

Cache entries are valid for 15mins as thats how long the ingester will hold flushed chunks for.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Goutham Veeramachaneni <[email protected]>
* Instrument the FifoCache
* Move expiry checks into the cache itself

Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Goutham Veeramachaneni <[email protected]>
@bboreham
Copy link
Contributor

Cache entries are valid for 15mins as thats how long the ingester will hold flushed chunks for.

Is that a recent change? How does that relate to https://github.com/weaveworks/cortex/blob/master/pkg/chunk/chunk_store.go#L208 ?

@tomwilkie
Copy link
Contributor Author

We've got min-chunk-age set to 15mins in our cluster.

The challenging with caching index entries is that that we need to ensure we don't miss a series, even when querying more that 15 mins in the past. So knowing that any entries in the cache are at max 15mins old, and knowing that the ingester is going to response with anything more recent than 15mins, allows us to make this optimisation. To be clear: this is a lower bound on the the cache freshness; we can probably cache most stuff for much longer (anything read from yesterdays rows can be cached for ever).

A counter example might be useful. Imagine:

  • t=0: we write a sample for foo{} to the ingesters
  • t=10: we do a query for foo{}[1h] and find no series in the index, but get the response from the ingester
  • t=15: foo{} is flushed
  • t=20: we do a query for foo{}[1h] and find (using the cache) no series in the index, but get the response from the ingester
  • t=30: foo{} is dropped from the ingester
  • t=35: we do a query for foo{}[1h] and find (using the cache) no series in the index, and get no response from the ingester <- wrong query response.

Hence, we much drop the cache entries after 15mins. Make sense?

@tomwilkie tomwilkie mentioned this pull request Aug 23, 2018
26 tasks
@bboreham
Copy link
Contributor

Does it become consistent if we add a line in the documentation stating that -store.index-cache-validity should be no higher than -ingester.max-chunk-idle ?

(and also one saying -store.min-chunk-age should be no higher than -ingester.max-chunk-idle )

@bboreham
Copy link
Contributor

Separately, I would be very keen to cache index lookups from previous days' buckets.

@tomwilkie
Copy link
Contributor Author

Does it become consistent if we add a line in the documentation stating that -store.index-cache-validity should be no higher than -ingester.max-chunk-idle ?

Yes we should add that line for sure.

Separately, I would be very keen to cache index lookups from previous days' buckets.

Agreed! Should be relatively easy to do after this change. We've had it as a TODO in #209 for 18 months!

Signed-off-by: Goutham Veeramachaneni <[email protected]>
@gouthamve gouthamve force-pushed the cache-index branch 2 times, most recently from 94d55ba to 2eb86b1 Compare August 23, 2018 12:54
@@ -28,26 +32,32 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
cfg.AWSStorageConfig.RegisterFlags(f)
cfg.GCPStorageConfig.RegisterFlags(f)
cfg.CassandraStorageConfig.RegisterFlags(f)

f.IntVar(&cfg.IndexCacheSize, "store.index-cache-size", 0, "Size of in-memory index cache, 0 to disable.")
f.DurationVar(&cfg.IndexCacheValidity, "store.index-cache-validity", 15*time.Minute, "Period for which entries in the index cache are valid. Should be no higher than -ingester.max-chunk-idle.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default here needs to match the default for -ingester.max-chunk-idle, which is 5 minutes right now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup sorry.

@tomwilkie
Copy link
Contributor Author

This is running and has made a big difference for us; LGTM from me.

default:
return nil, fmt.Errorf("Unrecognized storage client %v, choose one of: aws, gcp, cassandra, inmemory", cfg.StorageClient)
client, err = nil, fmt.Errorf("Unrecognized storage client %v, choose one of: aws, gcp, cassandra, inmemory", cfg.StorageClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should still return here rather than make a caching client from a nil client

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
@@ -1,20 +1,79 @@
package chunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly a note/question, it doesn't look like FifoCache is related to the chunk. Could it be pulled out to a different package in the future?

return &cachingStorageClient{
StorageClient: client,
cache: chunk.NewFifoCache("index", size, validity),
validity: validity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is validity needed on the cachingStorageClient? Or only to create the cache?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It was done outside the cache before, now I've pushed it into the cache. Forgot to remove this.

@csmarchbanks
Copy link
Contributor

A couple final small questions/comments. Then this LGTM.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
@gouthamve
Copy link
Contributor

@csmarchbanks fixed all comments, thanks!

@tomwilkie tomwilkie merged commit d124d33 into cortexproject:master Aug 28, 2018
@tomwilkie tomwilkie deleted the cache-index branch August 28, 2018 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants