Skip to content

Avoid writing duplicate chunks by checking the cache first #1475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2019

Conversation

bboreham
Copy link
Contributor

In the case where two ingesters have chunks for the same series, with the same start and end times and same contents, this change will skip one of the writes, which saves effort, and money with DynamoDB.

How often does this happen? It depends on the type of timeseries data Cortex is handling. It is most likely for short chunks, e.g. cAdvisor metrics from containers that run just a few minutes. It is least likely for long-running series as they will be flushed at a time relative to the start of the ingester process, so the end-time is unlikely to match. (But I'm working on that via -ingester.spread-flushes).

I thought about adding a metric to count the chunks saved, but you can see this via the hit-rate of the cache inside ingesters.

@bboreham bboreham merged commit b04f55d into master Jun 26, 2019
@tomwilkie tomwilkie deleted the dedupe-chunk-writes branch July 31, 2019 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants