Avoid writing duplicate chunks by checking the cache first #1475

bboreham · 2019-06-25T17:13:18Z

In the case where two ingesters have chunks for the same series, with the same start and end times and same contents, this change will skip one of the writes, which saves effort, and money with DynamoDB.

How often does this happen? It depends on the type of timeseries data Cortex is handling. It is most likely for short chunks, e.g. cAdvisor metrics from containers that run just a few minutes. It is least likely for long-running series as they will be flushed at a time relative to the start of the ingester process, so the end-time is unlikely to match. (But I'm working on that via -ingester.spread-flushes).

I thought about adding a metric to count the chunks saved, but you can see this via the hit-rate of the cache inside ingesters.

Signed-off-by: Bryan Boreham <[email protected]>

Avoid writing duplicate chunks by checking the cache first

107b7a9

Signed-off-by: Bryan Boreham <[email protected]>

csmarchbanks approved these changes Jun 25, 2019

View reviewed changes

bboreham merged commit b04f55d into master Jun 26, 2019

bboreham mentioned this pull request Jun 30, 2019

Optionally write stub entries to the chunk cache from ingesters #1482

Merged

tomwilkie deleted the dedupe-chunk-writes branch July 31, 2019 11:18

bboreham mentioned this pull request Aug 16, 2019

Reduce duplication when writing #607

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid writing duplicate chunks by checking the cache first #1475

Avoid writing duplicate chunks by checking the cache first #1475

bboreham commented Jun 25, 2019

Avoid writing duplicate chunks by checking the cache first #1475

Avoid writing duplicate chunks by checking the cache first #1475

Conversation

bboreham commented Jun 25, 2019