Increase the number and granularity of buckets for some of kube-scheduler's alpha metrics. Make exporting flag-gated #131551

ania-borowiec · 2025-04-30T11:44:26Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This is part of fixing #127384
Before this PR the values exported in the scheduler_plugin_execution_duration_seconds_bucket metric do not provide substantial information to kube-scheduler users or to sig-scheduling team. At the same time they tend to be a headache for cluster-admins, taking up memory.
This PR introduces more granular buckets for:
scheduler_event_handling_duration_seconds scheduler_plugin_execution_duration_seconds scheduler_queueing_hint_execution_duration_seconds scheduler_scheduling_algorithm_duration_seconds
It also introduces a feature gate SchedulerHighPrecisionMetrics (disabled by default) that toggles exporting the metrics:
scheduler_event_handling_duration_seconds scheduler_plugin_execution_duration_seconds scheduler_queueing_hint_execution_duration_seconds
to help save memory in the clusters.
The values of bucket boundaries have been adjusted to fit both scheduler's performance tests and user's request.

Which issue(s) this PR fixes:

Fixes #131209

Special notes for your reviewer:

I will create issues with a very specific description for refactoring these tests to use mocks (and thus actually become unit tests). Hopefully I can phrase it clear enough to be a 'good-first-issue'.

Does this PR introduce a user-facing change?

Increased accuracy of scheduler metrics: 
- scheduler_event_handling_duration_seconds
- scheduler_plugin_execution_duration_seconds
- scheduler_queueing_hint_execution_duration_seconds
- scheduler_scheduling_algorithm_duration_seconds

Export of metrics:
- scheduler_event_handling_duration_seconds
- scheduler_plugin_execution_duration_seconds
- scheduler_queueing_hint_execution_duration_seconds
is feature gated by SchedulerHighPrecisionMetrics (disabled by default).

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

…. Exporting them becomes flag-gated

k8s-ci-robot · 2025-04-30T11:44:36Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-04-30T11:44:37Z

Hi @ania-borowiec. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ania-borowiec · 2025-04-30T11:46:00Z

/cc dgrisonnet richabanker sanposhiho

sanposhiho · 2025-04-30T12:20:16Z

/ok-to-test

sanposhiho · 2025-04-30T12:26:24Z

Why do they have to be feature-gated? Like, can we just mark new ones as alpha metrics, and then drop the old ones later? That's the common approach of replacing the metrics.

richabanker · 2025-04-30T16:34:53Z

Why do they have to be feature-gated? Like, can we just mark new ones as alpha metrics, and then drop the old ones later? That's the common approach of replacing the metrics.

Think the reasoning was that we do not want to expose these high cardinality metrics (since the buckets have been made more granular) by default unless opted in, particularly since these metrics would contain granular data most likely to be used for development / perf-testing than for general monitoring for clusters/scheduler. Also, these will eventually be replaced with native-histogram metrics, ref

sanposhiho · 2025-04-30T16:54:00Z

The idea here sounds like some clusters (+ scheduler perf test) want to have this feature, but others might not, so use the feature gate to control ON/OFF.

I am not sure if it's a suitable usecase to use the feature flag. The feature flag IMO generally means "the feature is going to be rolled out to everyone eventually".
It's not a config parameter, it is not to implement on-off switch for such case.
So, I would prefer to have this on the scheduler config (or scheduler's flag?).

ania-borowiec · 2025-05-05T09:32:28Z

The idea here sounds like some clusters (+ scheduler perf test) want to have this feature, but others might not, so use the feature gate to control ON/OFF.

I am not sure if it's a suitable usecase to use the feature flag. The feature flag IMO generally means "the feature is going to be rolled out to everyone eventually". It's not a config parameter, it is not to implement on-off switch for such case. So, I would prefer to have this on the scheduler config (or scheduler's flag?).

SGTM, I'll rework the code and ping you, thanks

sanposhiho · 2025-05-05T10:01:36Z

Given scheduler-perf, which doesn't start the scheduler without the command, would use it, I am thinking that probably the scheduler config would be better than a command flag. But I am open to the discussion, if anyone has a different opinion.

these will eventually be replaced with native-histogram metrics

Want to dig a bit more to design the config parameter: Do we think that those native-histogram metrics would completely replace the existing metrics for all users? Or, will some use cases still need the existing old ones, and we need to keep both after all?

Also, do we know timeline when those new features are going to arrive?

ania-borowiec · 2025-05-05T10:08:41Z

AFAIU the main difference (from our perspective) between the new histograms vs the ones we currently used it that they can optimize memory usage. In a BIG simplification: you can configure them to use exponential buckets with a given growth factor, but if your values only fall into 4 buckets, then only those 4 timeseries will get exported.
So I think it sounds like the native histograms could be a solution for all use cases.

The timeline is not yet known, AFAIR there is hope for a KEP in 1.34 - @richabanker please chime in if I got that wrong, thanks!

sanposhiho · 2025-05-05T10:53:27Z

there is hope for a KEP in 1.34

If it's planned for the next release and will completely replace the existing ones, why do we need this pr, which modifies the existing metrics and will be released in 1.34?

ania-borowiec · 2025-05-05T11:11:03Z

That's mostly because I wasn't sure how to go about the feature gates. Ideally we'd like to get this ready to use by users as soon as possible

k8s-ci-robot · 2025-05-05T11:16:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ania-borowiec
Once this PR has been reviewed and has the lgtm label, please assign macsko for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sanposhiho · 2025-05-05T11:16:42Z

This change would, either way, be published to users at 1.34, hence the same timing.
Now, I would prefer just wait for the new feature to be available, and adopt to it as soon as possible to make it happen in the scheduler in 1.34. Much better than adding a parameter to control the old metrics that are going to be removed very soon.

sanposhiho · 2025-05-05T11:17:32Z

/hold

ania-borowiec · 2025-05-05T11:25:15Z

Do you think there's any way we could deliver it sooner, so that users can use this right away e.g. in their tests?
This user #131209 explicitly requested an fixing the plugin execution metric, I wish we could help them sooner, not relying on the native histograms timeline?

sanposhiho · 2025-05-05T11:46:05Z

We cannot deliver any new features without a new release, and the soonest new release is 1.34.

ania-borowiec · 2025-05-05T12:41:51Z

So I guess they key question now is: @richabanker how confident are you about native histograms being available to use in 1.34? :)

sanposhiho · 2025-05-05T13:00:30Z

I would probably say the same even if we end up delaying it to 1.35, though.
I was guessing it's a far-away future, but apparently that is already on the timeline. Then, I cannot agree to adding this feature along with a config param, only to ship a feature just to fix the problem one release faster, which will no longer be needed in the next release (1.35). This metric has existed for a long time, and we haven't received many complaints from many people; That also indicates the fix priority is not that high.
I'd rather just park this problem until the native histograms arrive, as long as it's on the work table of SIG-instrumentation.

For the scheduler-perf purpose as well, for now, we're mostly looking at the scheduling throughput, which isn't impacted by this bucket problem. In case we need to utilize those problematic metrics at some point before the native histograms come, then we can start considering something; probably just providing the internal Option so that the scheduler_perf can increase the granularity. No need to add a new API on the config or the flag, even in such cases.

k8s-ci-robot · 2025-05-05T13:37:31Z

@ania-borowiec: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-integration	`8c9c2fe`	link	true	`/test pull-kubernetes-integration`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

richabanker · 2025-05-05T15:13:07Z

I imaging the native-histogram support to be a long running effort where we gradually onboard different control plane components and enable the support one by one for each. Even while doing so, I think we want to continue supporting the old format along with the new one, just so the migration is not breaking to users, before defaulting to always exporting native-hist. I can only point you to this slack thread atm, but maybe @dgrisonnet can shed more light about the timeline once he's back.

Till then, I think having a scheduler config option to turn on the more granular buckets seems like a neat way to go about making these metrics usable for the end-user in the short term.

sanposhiho · 2025-05-05T16:37:00Z

the native-histogram support to be a long running effort where we gradually onboard different control plane components and enable the support one by one for each.

Then, can we just be a first migrator, adopting to it asap? instead of putting effort into this feature.
That surely would save a waste double efforts from us.

we want to continue supporting the old format along with the new one, just so the migration is not breaking to users, before defaulting to always exporting native-hist.

I know we won't remove the old ones immediately.
But, once the native-hist feature is implemented, for the users needing granular buckets, we can just say using new ones with enabling the feature flag for the native-hist bucket.
No need to implement this parameter for them, in addition to the native-hist feature.

Rather, why could we add a new config parameter and suggest start using it for users, while knowing a better native-hist feature is coming, and those old metrics will be deprecated eventually.
So, not only us, this would cause double efforts and confusion for those users.

From those reasons, I cannot say Yes to adding a new config parameter just for a temporal short-term mitigation for the problem, especially given it has existed for a long time but not many people have been complaining (i.e., low priority issue)

richabanker · 2025-05-05T16:54:27Z

Rather, why could we add a new config parameter and suggest start using it for users, while knowing a better native-hist feature is coming, and those old metrics will be deprecated eventually.
So, not only us, this would cause double efforts and confusion for those users.

For sure, we definitely do not want any wasted efforts here, which is why initially we(sig-instrumentation) had asked if its ok to wait on this till native-hist becomes available, ref - though it was mentioned at the time that the current state of the metric is broken and they are basically unusable for users due to the buckets not capturing meaningful data as they exist today. Which is not a good state either. Hence this PR was agreed upon as a short-term fix to unblock users of these metrics. But if thats not the case, and since there haven't been a lot of complaints, maybe its ok to wait for the native-hist introduction afterall?

Then, can we just be a first migrator, adopting to it asap?

That is actually great that we have already have a use case and an early-adopter ready to play with native-hist once they are ready. We will ensure that we make a note of this in the KEP for the same and involve you in the relevant discussions around the same. cc @dgrisonnet

krisztianfekete · 2025-05-09T13:14:15Z

Hey, so I've recently added native histogram support to kubernetes/controller-runtime here kubernetes-sigs/controller-runtime#3165 (and across our projects, too).

As described there, these are opt-in features for users (well, Prometheus instances), whereas today you explicitly need to enable --enable-feature=native-histograms. Without that, Prometheus will ignore the native ones. There's also a way to scrape both, so there are various ways around breaking anything (you can even have a compatible solutions with OR queries in Grafana dashboards).

k8s-ci-robot · 2025-05-14T13:31:34Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

alaypatel07 · 2025-05-16T01:57:12Z

@sanposhiho @richabanker @ania-borowiec since the current metrics are not usable for plugins that take longer time than 0.022 second, can we treat this as a bug, fix it with a config and be backported to supported releases?

The newer histogram metrics when available can replace the existing metric. But I want to emphasize, even if the new histogram metrics will be available they will be in 1.34+ version at best. We still have granularity issue for users running supported 1.30-1.33 clusters which need attention.

sanposhiho · 2025-05-16T02:52:06Z

I know (and, am sorry) it is very inconvenient for your case, but, as I mentioned,

We basically cannot introduce a new config with a patch release; that would be regarded as a new feature.
The patch release can only include fixes for critical bugs. But, we cannot justify this problem is critical enough. This metric has existed since 2019 and we hadn't noticed it until recently; that also indicates this is not critical.

Increase the number and granularity of buckets for some alpha metrics…

66a83bf

…. Exporting them becomes flag-gated

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 30, 2025

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Apr 30, 2025

k8s-ci-robot requested review from macsko and smarterclayton April 30, 2025 11:44

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 30, 2025

k8s-ci-robot requested review from dgrisonnet, richabanker and sanposhiho April 30, 2025 11:46

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 30, 2025

ania-borowiec mentioned this pull request Apr 30, 2025

Refactor pkg/scheduler/backend/queue tests to use mocks instead of a real metric registry #131553

Open

use configuration setting instead of feature gate flag

2cd1c7d

k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label May 5, 2025

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2025

updated generated files

8c9c2fe

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase the number and granularity of buckets for some of kube-scheduler's alpha metrics. Make exporting flag-gated #131551

Increase the number and granularity of buckets for some of kube-scheduler's alpha metrics. Make exporting flag-gated #131551

ania-borowiec commented Apr 30, 2025

k8s-ci-robot commented Apr 30, 2025

k8s-ci-robot commented Apr 30, 2025

ania-borowiec commented Apr 30, 2025

sanposhiho commented Apr 30, 2025

sanposhiho commented Apr 30, 2025

richabanker commented Apr 30, 2025 •

edited

Loading

sanposhiho commented Apr 30, 2025 •

edited

Loading

ania-borowiec commented May 5, 2025

sanposhiho commented May 5, 2025

ania-borowiec commented May 5, 2025

sanposhiho commented May 5, 2025 •

edited

Loading

ania-borowiec commented May 5, 2025

k8s-ci-robot commented May 5, 2025

sanposhiho commented May 5, 2025

sanposhiho commented May 5, 2025

ania-borowiec commented May 5, 2025

sanposhiho commented May 5, 2025

ania-borowiec commented May 5, 2025

sanposhiho commented May 5, 2025 •

edited

Loading

k8s-ci-robot commented May 5, 2025

richabanker commented May 5, 2025

sanposhiho commented May 5, 2025

richabanker commented May 5, 2025

krisztianfekete commented May 9, 2025

k8s-ci-robot commented May 14, 2025

alaypatel07 commented May 16, 2025 •

edited

Loading

sanposhiho commented May 16, 2025 •

edited

Loading

Increase the number and granularity of buckets for some of kube-scheduler's alpha metrics. Make exporting flag-gated #131551

Are you sure you want to change the base?

Increase the number and granularity of buckets for some of kube-scheduler's alpha metrics. Make exporting flag-gated #131551

Conversation

ania-borowiec commented Apr 30, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Apr 30, 2025

k8s-ci-robot commented Apr 30, 2025

ania-borowiec commented Apr 30, 2025

sanposhiho commented Apr 30, 2025

sanposhiho commented Apr 30, 2025

richabanker commented Apr 30, 2025 • edited Loading

sanposhiho commented Apr 30, 2025 • edited Loading

ania-borowiec commented May 5, 2025

sanposhiho commented May 5, 2025

ania-borowiec commented May 5, 2025

sanposhiho commented May 5, 2025 • edited Loading

ania-borowiec commented May 5, 2025

k8s-ci-robot commented May 5, 2025

sanposhiho commented May 5, 2025

sanposhiho commented May 5, 2025

ania-borowiec commented May 5, 2025

sanposhiho commented May 5, 2025

ania-borowiec commented May 5, 2025

sanposhiho commented May 5, 2025 • edited Loading

k8s-ci-robot commented May 5, 2025

richabanker commented May 5, 2025

sanposhiho commented May 5, 2025

richabanker commented May 5, 2025

krisztianfekete commented May 9, 2025

k8s-ci-robot commented May 14, 2025

alaypatel07 commented May 16, 2025 • edited Loading

sanposhiho commented May 16, 2025 • edited Loading

richabanker commented Apr 30, 2025 •

edited

Loading

sanposhiho commented Apr 30, 2025 •

edited

Loading

sanposhiho commented May 5, 2025 •

edited

Loading

sanposhiho commented May 5, 2025 •

edited

Loading

alaypatel07 commented May 16, 2025 •

edited

Loading

sanposhiho commented May 16, 2025 •

edited

Loading