Skip to content

Proposal: Early Compaction of Stale Series from the Head Block #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codesome
Copy link
Member

@codesome codesome commented Jul 4, 2025

@codesome codesome force-pushed the codesome/stale-series-compaction branch from ebbfe83 to 11dd563 Compare July 8, 2025 19:21
Copy link
Member

@machine424 machine424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.
Some questions/suggestions.
I think we can start with tracking those stale series via a metric #55 (comment).

For the rest of the changes, If it's easy to put together, having a PoC will be really helpful to see clearer and start gathering meaningful measurements.


### Alternative for tracking stale series

Consider when was the last sample scraped *in addition to* the above proposal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it's not really an alternative, maybe have it in # Future Consideration or somewhere else instead.


Implementation detail: if the usual head compaction is about to happen very soon, we should skip the stale series compaction and simply wait for the usual head compaction. The buffer can be hardcoded.

## Alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also mention (allowing to) reducing/tweaking storage.tsdb.min-block-duration and why it cannot help here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny enough, we already adjust storage.tsdb.min-block-duration to 1h as a mitigation. We still have issues where a team will rollout, rollback, and rollout again in a single hour causing a huge bump in head series. Typically leading to an OOM crashing Prometheus with tends of millions of stale series.


### Compacting Stale Series

We will have two thresholds to trigger stale series compaction, `p%` and `q%`, `q > p` (both indicating % of total series that are stale in the head). Both will be configurable and default to 0% (meaning stale series compaction is disabled).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be more user friendly if we just allow enabling the feature and have Prometheus choose the appropriate threshold (like the 3/2 we currently have e.g.)


## Goals

* Have a simple and efficient mechanism in the TSDB to track and identify stale series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember @SuperQ mentioning that somewhere, but it'd be great if we can start with a metric for that, it'll help us decide on the logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my idea was to start with a metric.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that there is also scrape_series_added


**Part 1**

At a regular interval (say 15 mins), we check if the stale series have crossed p% of the total series. If it has, we trigger a compaction that simply flushes these stale series into a block and removes it from the Head block (can be more than one block if the series crosses the block boundary). We skip WAL truncation and m-map files truncation at this stage and let the usual compaction cycle handle it. How we drop these compacted series during WAL replay is TBD during implementation (may need a new WAL record or use tombstone records).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we'll be dropping the records during replay, otherwise the restarts on OOM or scale up take too long. from the Why should be removed.


Consider when was the last sample scraped *in addition to* the above proposal.

For edge cases where we did not put the staleness markers, we can look at the difference between the last sample timestamp of the series and the max time of the head block, and if it crosses a threshold, call it stale. For example a series did not get a sample for 5 mins (i.e. head’s max time is 5 mins more than series’ last sample timestamp).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to align with the scrape logic that deals with staleness when staleness markers couldn't be inserted.


**Part 1**

At a regular interval (say 15 mins), we check if the stale series have crossed p% of the total series. If it has, we trigger a compaction that simply flushes these stale series into a block and removes it from the Head block (can be more than one block if the series crosses the block boundary). We skip WAL truncation and m-map files truncation at this stage and let the usual compaction cycle handle it. How we drop these compacted series during WAL replay is TBD during implementation (may need a new WAL record or use tombstone records).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the blocks be overlapping and merged during a normal compaction? we'd also need to take the merging overhead into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants