-
Notifications
You must be signed in to change notification settings - Fork 1.8k
pkg/{sdk,sdk/metrics}: Adding default metrics #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Gopkg.lock
Outdated
@@ -458,6 +458,6 @@ | |||
[solve-meta] | |||
analyzer-name = "dep" | |||
analyzer-version = 1 | |||
inputs-digest = "e39a3b50eecf50ee2f3c6ce8a36306abeea762a41fab1117f0c5e2a038b72fb4" | |||
inputs-digest = "d66983685b184f895ad58071cd04403f85b3b0d06b5818ca1c8225732f03b930" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed if nothing in Gopkg.lock changed. Meaning it wasn't out of sync with Gopkg.toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a comment that I think might help explain this conflict: golang/dep#1224 (comment)
I can not get this change to digest to not occur when running dep ensure. I think that means that anyone who takes this change, and If i manually revereted Gopkg.lock back to the old value, that when then ran dep ensure
they would get a change to Gopkg.lock. I think we should include this change because of the that.
pkg/sdk/informer.go
Outdated
@@ -18,6 +18,7 @@ import ( | |||
"context" | |||
"time" | |||
|
|||
"github.com/operator-framework/operator-sdk/pkg/sdk/metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave a line between 21 and 22. We have the convention of grouping the project related imports separately.
pkg/sdk/metrics/metrics.go
Outdated
// Collector - metric collector for all the metrics the sdk will watch | ||
type Collector struct { | ||
Events *prom.CounterVec | ||
SyncEvents *prom.CounterVec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename these fields to be more indicative of the metrics that they're tracking:
EventType
ReconcileResult
Or something similar.
pkg/sdk/metrics/metrics.go
Outdated
} | ||
} | ||
|
||
// RegisterCollector - add collector safetly to prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safetly ==> safely
pkg/sdk/metrics/metrics.go
Outdated
// RegisterCollector - add collector safetly to prometheus | ||
func RegisterCollector(c *Collector) { | ||
defer func() { | ||
if r := recover(); r != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is recover doing? I don't see it defined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what recover is doing:
https://blog.golang.org/defer-panic-and-recover
I am going to change this because I forget that there is a Register
method that returns an error. This will be more clear.
pkg/sdk/metrics/metrics.go
Outdated
|
||
} | ||
|
||
// Collect returns the current ssstate of the metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ssstate ==> state
@shawn-hurley Were you able to test out an operator with this PR? What does the output of curling the metrics endpoint look like with these metrics? Can try it out and post the output here. |
$ curl app-operator.default.svc.cluster.local:60000/metrics
...
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 458752
# HELP go_memstats_sys_bytes Number of bytes obtained by system. Sum of all system allocations.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.0066168e+07
# HELP operator_sdk_event_type events that the sdk has recieved, segmented by type(add or delete or update)
# TYPE operator_sdk_event_type counter
operator_sdk_event_type{type="add"} 1
operator_sdk_event_type{type="update"} 2
# HELP operator_sdk_reconcile_result reconcilation events that the sdk has processed segmented by result(success or failure)
# TYPE operator_sdk_reconcile_result counter
operator_sdk_reconcile_result{result="success"} 3
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.1
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 8
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.0262912e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.53183892809e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.2391168e+07 |
pkg/sdk/metrics/metrics.go
Outdated
) | ||
|
||
const ( | ||
eventsMetricName = "operator_sdk_event_type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed the operator_sdk
prefix. I don't think that adds any more meaning to the metric name so we should probably drop it. Same for operator_sdk_reconcile_result
.
A more meaningful prefix might be the controller name itself. But since we just have one collector per operator and until we support multiple controllers via the controller-runtime we can just keep it as event_type
and reconcile_result
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the metric name should also be shorter but i think having the word operator
in there is useful as there might be plenty of non-operator metrics that have the string "event_type" in it. How about operator_reconcile_results
and operator_event_types
. Also you should use pluralization, and probably add a suffix of the unit for these metrics (like _total
) see: https://prometheus.io/docs/practices/naming/ for best practices on naming.
pkg/sdk/metrics/metrics.go
Outdated
|
||
const ( | ||
eventsMetricName = "operator_event_types_total" | ||
syncEventsMetricName = "operator_reconcile_results_total" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also rename the variables for the reconcile_results
metric accordingly.
syncEventsMetricName
==> reconcileResultsMetricName
Also below
SyncResultSuccess
==> ReconcileResultSuccess
SyncResultFailure
==> ReconcileResultFailure
pkg/sdk/informer.go
Outdated
@@ -40,14 +42,16 @@ type informer struct { | |||
namespace string | |||
context context.Context | |||
deletedObjects map[string]interface{} | |||
metrics *metrics.Collector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this as collector *metrics.Collector
in sdk/api.go. We should make them consistent.
Rename it to either collector or both of them to something similar.
@shawn-hurley The overall implementation seems good to me. By putting all the controller's collector code in it's own package in Since So I'm thinking we move |
pkg/sdk/metrics/metrics.go
Outdated
} | ||
|
||
// Describe returns all the descriptions of the collector | ||
func (s *Collector) Describe(ch chan<- *prom.Desc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be func (c *Collector)
. The receiver name should be an abbreviation of the type.
Same for Collect below.
I initially thought that exposing the metrics to the user of the SDK it would help them register and add their own metrics, as well as reducing the scope of the SDK package. The more I think about this, the less likely it seems this should be a goal. I think that if we are worried about exposing the metrics to the user than we should make this an internal package because I believe that the implementation of metrics should be inside it's own package, and that the SDK package should only relay on the exported values from the metrics package. |
Recapping our offline discussion: |
LGTM. Can you rebase the Gopkg.lock file. It got changed with #346 |
this sets up the sdk to expose metrics by registering metrics with prometheus from an internal package.
this sets up the sdk to expose metrics by registering metrics with
prometheus.
it also adds the metrics as a type to be used by clients to update the
metrics.