Alerting on number of chunks per series is noisy #1386

tomwilkie · 2019-05-13T16:50:23Z

We alert when there are more than 1.3 chunks per series per ingester. This works most of the time, but when a new ingester is added (without transfers), all the series in that ingester are created, and flushed, at the same time. This means for a period of at least 15 mins, there is almost 2 chunks per series for this ingester.

I propose instead of alert on this, we export the age of the oldest chunk, and alert on that. This will also catch cases where chunks persistently fail to flush. The question is, how do we implement this metrics without scanning all chunks every scrape interval?

tomwilkie · 2019-05-13T16:50:56Z

The ingester scans all series every few minutes for the flush loop - we could work out the oldest chunk then and export it as a gauge.

bboreham · 2019-05-13T17:01:36Z

In my world I expect to be flushing a few hours behind at times. Perhaps it's different between AWS and GCP. Anyway is that what you mean - the alert won't go off until hours after you have a problem?

tomwilkie · 2019-10-29T09:26:54Z

@pstibrany mind taking a look at this?

pstibrany mentioned this issue Nov 4, 2019

Export Unix timestamp of oldest unflushed chunk in the memory. #1776

Merged

gouthamve closed this as completed in #1776 Nov 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerting on number of chunks per series is noisy #1386

Alerting on number of chunks per series is noisy #1386

tomwilkie commented May 13, 2019

tomwilkie commented May 13, 2019

bboreham commented May 13, 2019

tomwilkie commented Oct 29, 2019

Alerting on number of chunks per series is noisy #1386

Alerting on number of chunks per series is noisy #1386

Comments

tomwilkie commented May 13, 2019

tomwilkie commented May 13, 2019

bboreham commented May 13, 2019

tomwilkie commented Oct 29, 2019