You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We alert when there are more than 1.3 chunks per series per ingester. This works most of the time, but when a new ingester is added (without transfers), all the series in that ingester are created, and flushed, at the same time. This means for a period of at least 15 mins, there is almost 2 chunks per series for this ingester.
I propose instead of alert on this, we export the age of the oldest chunk, and alert on that. This will also catch cases where chunks persistently fail to flush. The question is, how do we implement this metrics without scanning all chunks every scrape interval?
The text was updated successfully, but these errors were encountered:
In my world I expect to be flushing a few hours behind at times. Perhaps it's different between AWS and GCP. Anyway is that what you mean - the alert won't go off until hours after you have a problem?
We alert when there are more than 1.3 chunks per series per ingester. This works most of the time, but when a new ingester is added (without transfers), all the series in that ingester are created, and flushed, at the same time. This means for a period of at least 15 mins, there is almost 2 chunks per series for this ingester.
I propose instead of alert on this, we export the age of the oldest chunk, and alert on that. This will also catch cases where chunks persistently fail to flush. The question is, how do we implement this metrics without scanning all chunks every scrape interval?
The text was updated successfully, but these errors were encountered: