-
Notifications
You must be signed in to change notification settings - Fork 40.6k
KEP-4346: Add metrics for informer #129160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Wed Dec 11 12:08:11 UTC 2024. |
Hi @xigang. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
p.metrics.processDuration.Observe(time.Since(startTime).Seconds()) | ||
//TODO: This requires implementing Len() and Capacity() for ring growing | ||
// p.metrics.numberOfPendingNotifications.Set(float64(p.pendingNotifications.Len())) | ||
// p.metrics.sizeOfRingGrowing.Set(float64(p.pendingNotifications.Capacity())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to wait for the Len()
and Capacity()
methods in the ring growing package to be merged.
PR: kubernetes/utils#321
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this single-threaded? (is calling Len and Capacity independently and not under lock safe here, given the pendingNotifications
is not thread-safe?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, pendingNotifications
is not thread-safe. The pop()
and run()
goroutines will concurrently read and write. Use atomic operations to ensure data races are eliminated.
It can be fixed as follows:
metricsUpdateCounter++
if metricsUpdateCounter >= metricsUpdateBatch || time.Since(lastMetricsUpdate) >= metricsUpdateInterval {
p.metrics.processDuration.Observe(time.Since(startTime).Seconds())
// Read count using atomic operation
p.metrics.numberOfPendingNotifications.Set(float64(atomic.LoadInt64(&p.pendingNotificationsCount)))
p.metrics.sizeOfRingGrowing.Set(float64(p.pendingNotifications.Cap()))
metricsUpdateCounter = 0
lastMetricsUpdate = time.Now()
}
}()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/sig api-machinery |
for sig-instrumentation review /assign |
/cc @richabanker |
@pohly Yes. Based on the current input from the informer, I don't have a good way to handle this special case. If there isn't a good solution, in the short term, can we accept this special case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two additional comments:
- The informer and reflector provider should probably be exposed in their respective options to allow consumers to override the metrics
- I think it could be useful if controllers would propagate their name to informers/reflectors and fifos no? Maybe doing that in a new
owner
label of something of the sort to be able to identify the responsible controller more easily?
@@ -602,6 +603,10 @@ func newInformer(clientState Store, options InformerOptions) Controller { | |||
KnownObjects: clientState, | |||
EmitDeltaTypeReplaced: true, | |||
Transformer: options.Transform, | |||
Metrics: newInformerMetrics(InformerIdentifier{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably better to decouple FIFO metrics from the informer ones. I don't think Kubernetes is using the FIFO outside of informers, but some users of client_go might
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FIFO metrics have been decoupled into the FIFOMetricsProvider
interface in fifo_metrics.go
. Additionally, FIFOMetricsProvider
have been exposed in DeltaFIFOOptions
to allow custom providers to override the default metrics.
done.
return ringGrowingCapacity.WithLabelValues(name, resourceType, handlerName) | ||
} | ||
|
||
func (informerMetricsProvider) NewPrcoessDurationMetric(name string, resourceType string, handlerName string) cache.HistogramMetric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (informerMetricsProvider) NewPrcoessDurationMetric(name string, resourceType string, handlerName string) cache.HistogramMetric { | |
func (informerMetricsProvider) NewProcessDurationMetric(name string, resourceType string, handlerName string) cache.HistogramMetric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -58,6 +58,9 @@ type DeltaFIFOOptions struct { | |||
|
|||
// If set, log output will go to this logger instead of klog.Background(). | |||
Logger *klog.Logger | |||
|
|||
// If set, metrics will be collected for the informer. | |||
Metrics *informerMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should pass the provider here so that consumers of the library can override the metrics if they need. I know that controller-runtime does that with other packages. For example https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/controller/priorityqueue/priorityqueue.go#L58-L60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FIFOMetricsProvider
interface has been exposed in DeltaFIFOOptions
, allowing users to provide custom providers. Additionally, the relevant provider has been exposed in both the informer
and reflector
options.
done.
// makeValidPrometheusLabelValue converts a string into a valid Prometheus label value. | ||
// A valid label value must match the regex [a-zA-Z_:][a-zA-Z0-9_:]* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a best practice for label and metric names, not values. For values you can have any UTF-8 sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The makeValidPrometheusLabelValue
code has been removed.
done.
1effb46
to
0592ce6
Compare
0592ce6
to
a90b2ab
Compare
Thanks, @dgrisonnet . Comments addressed. PTAL. |
Done. |
The Done. |
@dgrisonnet , I’ve addressed all the comments above. Could you please take another look when you have time? Thanks! |
Signed-off-by: xigang <[email protected]>
@xigang: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/test pull-kubernetes-e2e-gce |
@dgrisonnet, just following up on this small fix PR that you’ve partially reviewed. Also looping in @richabanker @sbueringer and @alvaroaleman — if you have time, a quick look would be much appreciated. Thanks! 🙇 |
@xigang there is a failing check that needs to be resolved. |
@RainbowMango Once this PR is merged, the client-go staging code will be synced to the kubernetes/client-go repository’s main branch, and the next run of apidiff will no longer report any |
@richabanker This PR has been blocked for a while — could you take a look? @dgrisonnet hasn’t responded recently. Thanks! |
Queuing up, will try my best to get to it this week |
What type of PR is this?
/kind feature
What this PR does / why we need it:
KEP-4346
https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4346-informer-metrics
Which issue(s) this PR fixes:
#121474
#129795
#117123
#122067 (comment)
#130767
kubernetes/client-go#1027
kubernetes-sigs/controller-runtime#817
kubernetes-sigs/controller-runtime#3189
kubernetes-sigs/controller-runtime#3182
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: