Avoid writing duplicate chunks by checking the cache first #1475
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the case where two ingesters have chunks for the same series, with the same start and end times and same contents, this change will skip one of the writes, which saves effort, and money with DynamoDB.
How often does this happen? It depends on the type of timeseries data Cortex is handling. It is most likely for short chunks, e.g. cAdvisor metrics from containers that run just a few minutes. It is least likely for long-running series as they will be flushed at a time relative to the start of the ingester process, so the end-time is unlikely to match. (But I'm working on that via
-ingester.spread-flushes
).I thought about adding a metric to count the chunks saved, but you can see this via the hit-rate of the cache inside ingesters.