Skip to content

Conversation

@godwinpang
Copy link
Contributor

@godwinpang godwinpang commented Nov 6, 2025

Add controller_runtime_reconcile_timeouts_total metric

Summary

This PR adds a new metric controller_runtime_reconcile_timeouts_total to track when a controller's reconciliation has reached a ReconciliationTimeout. This provides visibility into when reconcile operations time out due to the controller-runtime wrapper timeout, allowing users to alert / monitor unexpectedly long running controller reconiliations.

Problem

The ReconciliationTimeout feature was added to prevent head-of-line (HOL) blocking by ensuring that a single reconcile cannot block a worker indefinitely. However, when a reconcile exits because ReconciliationTimeout fired, the existing metrics treat it like a normal return. There was no way to distinguish between:

  • A reconcile that completed successfully
  • A reconcile that timed out due to the ReconciliationTimeout context timeout

Metric

controller_runtime_reconcile_timeouts_total{controller="<name>"}
  • Type: Counter
  • Labels: controller (controller name)
  • Description: Total number of reconciliation timeouts per controller. This metric only increments when the wrapped context with ReconcileTimeout has deadline exceeded, not when user reconciler cancels the context or completes before the timeout.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. labels Nov 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: godwinpang
Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 6, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @godwinpang. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 6, 2025
@godwinpang godwinpang changed the title Add ReconcileTimeouts metric to track ReconciliationTimeout timeouts ✨ Add ReconcileTimeouts metric to track ReconciliationTimeout timeouts Nov 6, 2025
@godwinpang godwinpang changed the title ✨ Add ReconcileTimeouts metric to track ReconciliationTimeout timeouts ✨ Add controller_runtime_reconcile_timeouts_total metric to track ReconciliationTimeout timeouts Nov 6, 2025
@godwinpang godwinpang force-pushed the add-reconcile-timeouts-metric branch from 049f3d1 to 0a03d36 Compare November 6, 2025 00:07
This PR adds a new metric controller_runtime_reconcile_timeouts_total to track when a controller's reconciliation has reached a ReconciliationTimeout.
This provides visibility into when reconcile operations time out due to the controller-runtime wrapper timeout, allowing users to alert / monitor
unexpectedly long running controller reconiliations.
@godwinpang godwinpang force-pushed the add-reconcile-timeouts-metric branch from 0a03d36 to 39c3abd Compare November 6, 2025 00:09
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Nov 6, 2025
@godwinpang godwinpang marked this pull request as ready for review November 6, 2025 00:18
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 6, 2025
@godwinpang
Copy link
Contributor Author

sorry this ended up requesting a lot of reviewers - It's been a while since I've worked on c-r and I forgot bout the commit conventions hence the force pushes

@alvaroaleman
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 6, 2025
Comment on lines +168 to +174
Expect(func() error {
Expect(ctrlmetrics.ReconcileTimeouts.WithLabelValues(ctrl.Name).Write(&reconcileTimeouts)).To(Succeed())
if reconcileTimeouts.GetCounter().GetValue() != 0.0 {
return fmt.Errorf("metric reconcile timeouts not reset")
}
return nil
}()).Should(Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be part of the before each?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants