Skip to content

pass user id metrics to prom eval metrics in ruler #1548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jtlisi
Copy link
Contributor

@jtlisi jtlisi commented Jul 31, 2019

Another PR involved in the unwinding of #1513

This PR passes a user wrapped register to evaluated rule groups so the userID will accompany each set of metrics a rule group generates.

Signed-off-by: Jacob Lisi [email protected]

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many metrics are we talking about, per user?

@@ -36,7 +36,7 @@ var (
Namespace: "cortex",
Name: "group_evaluation_duration_seconds",
Help: "The duration for a rule group to execute.",
Buckets: []float64{.1, .25, .5, 1, 2.5, 5, 10, 25},
Buckets: []float64{.5, 1, 2.5, 5, 10, 25, 60, 120},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change isn't mentioned in the commit message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll will take this out 👍

I have some rule groups that take a long time to eval and having larger buckets can be nice

@jtlisi
Copy link
Contributor Author

jtlisi commented Aug 1, 2019

@bboreham each user will have 16 metrics, plus 3 for each rule group

@jtlisi jtlisi force-pushed the 20190731_pass_userid_ruler_prom_metrics branch from 8cbb7e2 to ba2a617 Compare August 1, 2019 16:57
@bboreham
Copy link
Contributor

bboreham commented Aug 1, 2019

Seems like quite a lot of data (more than double the existing number of per-user metrics?).
Maybe not so bad since each user will only be on one ruler whereas other metrics come from distributor, ingester, etc., and are multiplied by the number of instances.

@jtlisi
Copy link
Contributor Author

jtlisi commented Aug 1, 2019

Also worth considering a number of these metrics will already exist without a user label. Specifically the 3 per rule group. But instead of just having the rulegroup label, after this PR they will have the user ID as an accompanying label. This also means in theory as things stand, if two users have a rulegroup with the same name, they will both utilize the same metrics.

@bboreham
Copy link
Contributor

bboreham commented Aug 6, 2019

Do the metrics work for you, either before or after this change? See #1557.

@jtlisi
Copy link
Contributor Author

jtlisi commented Aug 6, 2019

@bboreham sadly the majority of these metrics are set in the run function we do not call for the RuleGroup in prometheus: https://github.com/prometheus/prometheus/blob/41151ca8dc069448515f48893b8631b9a3ad8df8/rules/manager.go#L279

I haven't put much thought into it but maybe some upstream refactoring can give us all of the evaluation related metrics. However, I don't thing we will ever be able to make much use of the latency/iteration metrics. That would mean letting the prometheus rule group itself handle scheduling and evaluation. That seems like a step backwards in our case since we already have the scheduling queue.

@bboreham
Copy link
Contributor

bboreham commented Aug 6, 2019

I have wondered about passing all rules via temporary files inside the container filesystem, similar to how we do template files. This would enable us to use Prometheus' native rule manager and remove a lot of ruler code.

@jtlisi jtlisi closed this Aug 14, 2019
@jtlisi
Copy link
Contributor Author

jtlisi commented Aug 14, 2019

I'm closing this in favor of utilizing the prometheus rule manager #1571, which will embed and register these metrics by default. It will also actually record values for all of the metrics provided by the rule group.

@jtlisi jtlisi deleted the 20190731_pass_userid_ruler_prom_metrics branch August 14, 2019 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants