-
Notifications
You must be signed in to change notification settings - Fork 16
Proposal: Early Compaction of Stale Series from the Head Block #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ganesh Vernekar <[email protected]>
ebbfe83
to
11dd563
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this.
Some questions/suggestions.
I think we can start with tracking those stale series via a metric #55 (comment).
For the rest of the changes, If it's easy to put together, having a PoC will be really helpful to see clearer and start gathering meaningful measurements.
|
||
### Alternative for tracking stale series | ||
|
||
Consider when was the last sample scraped *in addition to* the above proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it's not really an alternative, maybe have it in # Future Consideration or somewhere else instead.
|
||
Implementation detail: if the usual head compaction is about to happen very soon, we should skip the stale series compaction and simply wait for the usual head compaction. The buffer can be hardcoded. | ||
|
||
## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also mention (allowing to) reducing/tweaking storage.tsdb.min-block-duration and why it cannot help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny enough, we already adjust storage.tsdb.min-block-duration
to 1h as a mitigation. We still have issues where a team will rollout, rollback, and rollout again in a single hour causing a huge bump in head series. Typically leading to an OOM crashing Prometheus with tends of millions of stale series.
|
||
### Compacting Stale Series | ||
|
||
We will have two thresholds to trigger stale series compaction, `p%` and `q%`, `q > p` (both indicating % of total series that are stale in the head). Both will be configurable and default to 0% (meaning stale series compaction is disabled). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be more user friendly if we just allow enabling the feature and have Prometheus choose the appropriate threshold (like the 3/2
we currently have e.g.)
|
||
## Goals | ||
|
||
* Have a simple and efficient mechanism in the TSDB to track and identify stale series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember @SuperQ mentioning that somewhere, but it'd be great if we can start with a metric for that, it'll help us decide on the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my idea was to start with a metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that there is also scrape_series_added
|
||
**Part 1** | ||
|
||
At a regular interval (say 15 mins), we check if the stale series have crossed p% of the total series. If it has, we trigger a compaction that simply flushes these stale series into a block and removes it from the Head block (can be more than one block if the series crosses the block boundary). We skip WAL truncation and m-map files truncation at this stage and let the usual compaction cycle handle it. How we drop these compacted series during WAL replay is TBD during implementation (may need a new WAL record or use tombstone records). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we'll be dropping the records during replay, otherwise the restarts on OOM or scale up take too long.
from the Why should be removed.
|
||
Consider when was the last sample scraped *in addition to* the above proposal. | ||
|
||
For edge cases where we did not put the staleness markers, we can look at the difference between the last sample timestamp of the series and the max time of the head block, and if it crosses a threshold, call it stale. For example a series did not get a sample for 5 mins (i.e. head’s max time is 5 mins more than series’ last sample timestamp). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll need to align with the scrape logic that deals with staleness when staleness markers couldn't be inserted.
|
||
**Part 1** | ||
|
||
At a regular interval (say 15 mins), we check if the stale series have crossed p% of the total series. If it has, we trigger a compaction that simply flushes these stale series into a block and removes it from the Head block (can be more than one block if the series crosses the block boundary). We skip WAL truncation and m-map files truncation at this stage and let the usual compaction cycle handle it. How we drop these compacted series during WAL replay is TBD during implementation (may need a new WAL record or use tombstone records). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would the blocks be overlapping and merged during a normal compaction? we'd also need to take the merging overhead into account.
For prometheus/prometheus#13616