-
Notifications
You must be signed in to change notification settings - Fork 816
pass user id metrics to prom eval metrics in ruler #1548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pass user id metrics to prom eval metrics in ruler #1548
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many metrics are we talking about, per user?
pkg/ruler/ruler.go
Outdated
@@ -36,7 +36,7 @@ var ( | |||
Namespace: "cortex", | |||
Name: "group_evaluation_duration_seconds", | |||
Help: "The duration for a rule group to execute.", | |||
Buckets: []float64{.1, .25, .5, 1, 2.5, 5, 10, 25}, | |||
Buckets: []float64{.5, 1, 2.5, 5, 10, 25, 60, 120}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change isn't mentioned in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll will take this out 👍
I have some rule groups that take a long time to eval and having larger buckets can be nice
@bboreham each user will have 16 metrics, plus 3 for each rule group |
Signed-off-by: Jacob Lisi <[email protected]>
8cbb7e2
to
ba2a617
Compare
Seems like quite a lot of data (more than double the existing number of per-user metrics?). |
Also worth considering a number of these metrics will already exist without a user label. Specifically the 3 per rule group. But instead of just having the rulegroup label, after this PR they will have the user ID as an accompanying label. This also means in theory as things stand, if two users have a rulegroup with the same name, they will both utilize the same metrics. |
Do the metrics work for you, either before or after this change? See #1557. |
@bboreham sadly the majority of these metrics are set in the I haven't put much thought into it but maybe some upstream refactoring can give us all of the evaluation related metrics. However, I don't thing we will ever be able to make much use of the latency/iteration metrics. That would mean letting the prometheus rule group itself handle scheduling and evaluation. That seems like a step backwards in our case since we already have the scheduling queue. |
I have wondered about passing all rules via temporary files inside the container filesystem, similar to how we do template files. This would enable us to use Prometheus' native rule manager and remove a lot of ruler code. |
I'm closing this in favor of utilizing the prometheus rule manager #1571, which will embed and register these metrics by default. It will also actually record values for all of the metrics provided by the rule group. |
Another PR involved in the unwinding of #1513
This PR passes a user wrapped register to evaluated rule groups so the userID will accompany each set of metrics a rule group generates.
Signed-off-by: Jacob Lisi [email protected]