Skip to content

Alerting on number of chunks per series is noisy #1386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tomwilkie opened this issue May 13, 2019 · 3 comments · Fixed by #1776
Closed

Alerting on number of chunks per series is noisy #1386

tomwilkie opened this issue May 13, 2019 · 3 comments · Fixed by #1776

Comments

@tomwilkie
Copy link
Contributor

We alert when there are more than 1.3 chunks per series per ingester. This works most of the time, but when a new ingester is added (without transfers), all the series in that ingester are created, and flushed, at the same time. This means for a period of at least 15 mins, there is almost 2 chunks per series for this ingester.

I propose instead of alert on this, we export the age of the oldest chunk, and alert on that. This will also catch cases where chunks persistently fail to flush. The question is, how do we implement this metrics without scanning all chunks every scrape interval?

@tomwilkie
Copy link
Contributor Author

The ingester scans all series every few minutes for the flush loop - we could work out the oldest chunk then and export it as a gauge.

@bboreham
Copy link
Contributor

In my world I expect to be flushing a few hours behind at times. Perhaps it's different between AWS and GCP. Anyway is that what you mean - the alert won't go off until hours after you have a problem?

@tomwilkie
Copy link
Contributor Author

@pstibrany mind taking a look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants