Skip to content

pkg/{sdk,sdk/metrics}: Adding default metrics #349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 23, 2018

Conversation

shawn-hurley
Copy link
Member

this sets up the sdk to expose metrics by registering metrics with
prometheus.

it also adds the metrics as a type to be used by clients to update the
metrics.

Gopkg.lock Outdated
@@ -458,6 +458,6 @@
[solve-meta]
analyzer-name = "dep"
analyzer-version = 1
inputs-digest = "e39a3b50eecf50ee2f3c6ce8a36306abeea762a41fab1117f0c5e2a038b72fb4"
inputs-digest = "d66983685b184f895ad58071cd04403f85b3b0d06b5818ca1c8225732f03b930"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed if nothing in Gopkg.lock changed. Meaning it wasn't out of sync with Gopkg.toml.

Copy link
Member Author

@shawn-hurley shawn-hurley Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a comment that I think might help explain this conflict: golang/dep#1224 (comment)

I can not get this change to digest to not occur when running dep ensure. I think that means that anyone who takes this change, and If i manually revereted Gopkg.lock back to the old value, that when then ran dep ensure they would get a change to Gopkg.lock. I think we should include this change because of the that.

@@ -18,6 +18,7 @@ import (
"context"
"time"

"github.com/operator-framework/operator-sdk/pkg/sdk/metrics"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a line between 21 and 22. We have the convention of grouping the project related imports separately.

// Collector - metric collector for all the metrics the sdk will watch
type Collector struct {
Events *prom.CounterVec
SyncEvents *prom.CounterVec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename these fields to be more indicative of the metrics that they're tracking:

EventType
ReconcileResult

Or something similar.

}
}

// RegisterCollector - add collector safetly to prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safetly ==> safely

// RegisterCollector - add collector safetly to prometheus
func RegisterCollector(c *Collector) {
defer func() {
if r := recover(); r != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is recover doing? I don't see it defined here.

Copy link
Member Author

@shawn-hurley shawn-hurley Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what recover is doing:
https://blog.golang.org/defer-panic-and-recover

I am going to change this because I forget that there is a Register method that returns an error. This will be more clear.


}

// Collect returns the current ssstate of the metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssstate ==> state

@hasbro17
Copy link
Contributor

@shawn-hurley Were you able to test out an operator with this PR? What does the output of curling the metrics endpoint look like with these metrics? Can try it out and post the output here.

@shawn-hurley
Copy link
Member Author

$ curl app-operator.default.svc.cluster.local:60000/metrics
...
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 458752
# HELP go_memstats_sys_bytes Number of bytes obtained by system. Sum of all system allocations.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.0066168e+07
# HELP operator_sdk_event_type events that the sdk has recieved, segmented by type(add or delete or update)
# TYPE operator_sdk_event_type counter
operator_sdk_event_type{type="add"} 1
operator_sdk_event_type{type="update"} 2
# HELP operator_sdk_reconcile_result reconcilation events that the sdk has processed segmented by result(success or failure)
# TYPE operator_sdk_reconcile_result counter
operator_sdk_reconcile_result{result="success"} 3
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.1
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 8
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.0262912e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.53183892809e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.2391168e+07

)

const (
eventsMetricName = "operator_sdk_event_type"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed the operator_sdk prefix. I don't think that adds any more meaning to the metric name so we should probably drop it. Same for operator_sdk_reconcile_result.

A more meaningful prefix might be the controller name itself. But since we just have one collector per operator and until we support multiple controllers via the controller-runtime we can just keep it as event_type and reconcile_result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the metric name should also be shorter but i think having the word operator in there is useful as there might be plenty of non-operator metrics that have the string "event_type" in it. How about operator_reconcile_results and operator_event_types. Also you should use pluralization, and probably add a suffix of the unit for these metrics (like _total) see: https://prometheus.io/docs/practices/naming/ for best practices on naming.


const (
eventsMetricName = "operator_event_types_total"
syncEventsMetricName = "operator_reconcile_results_total"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rename the variables for the reconcile_results metric accordingly.
syncEventsMetricName ==> reconcileResultsMetricName
Also below
SyncResultSuccess ==> ReconcileResultSuccess
SyncResultFailure ==> ReconcileResultFailure

@@ -40,14 +42,16 @@ type informer struct {
namespace string
context context.Context
deletedObjects map[string]interface{}
metrics *metrics.Collector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this as collector *metrics.Collector in sdk/api.go. We should make them consistent.
Rename it to either collector or both of them to something similar.

@hasbro17
Copy link
Contributor

@shawn-hurley The overall implementation seems good to me.
But I'm not sure about the package placement and the visibility of pkg/sdk/metrics.

By putting all the controller's collector code in it's own package in pkg/sdk/metrics/metrics.go we have New() *Collector and RegisterCollector(c *Collector) as public so that the informer and api in pkg/sdk can use those functions. But we don't really need to expose this outside pkg/sdk e.g to the user in the handler.
We've done this before where we had a bunch of pkgs in the sdk and then we consolidated them back into pkg/sdk since they were just meant to be internal to pkg/sdk (#242).

Since pkg/sdk is the consumer of pkg/sdk/metrics we should just move it into pkg/sdk.
We already have pkg/sdk/metrics.go from when we exposed the metrics port.

So I'm thinking we move pkg/sdk/metrics/metrics.go to pkg/sdk/metrics.go and make the collector functions used by the informer private:
New() *Collector ==> newCollector() *Collector
RegisterCollector(c *Collector) ==> registerCollector(c *Collector)
The Collector struct should be private too type collector struct { since that collector is only meant to be used inside pkg/sdk.
What do you think?

}

// Describe returns all the descriptions of the collector
func (s *Collector) Describe(ch chan<- *prom.Desc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be func (c *Collector). The receiver name should be an abbreviation of the type.
Same for Collect below.

@shawn-hurley
Copy link
Member Author

Since pkg/sdk is the consumer of pkg/sdk/metrics we should just move it into pkg/sdk.
We already have pkg/sdk/metrics.go from when we exposed the metrics port.

I initially thought that exposing the metrics to the user of the SDK it would help them register and add their own metrics, as well as reducing the scope of the SDK package. The more I think about this, the less likely it seems this should be a goal.

I think that if we are worried about exposing the metrics to the user than we should make this an internal package because I believe that the implementation of metrics should be inside it's own package, and that the SDK package should only relay on the exported values from the metrics package.

@hasbro17
Copy link
Contributor

Recapping our offline discussion: pkg/sdk/metrics should be moved to pkg/sdk/internal/metrics so it's safe for internal use without being exposed as an API to the user.

@hasbro17
Copy link
Contributor

LGTM. Can you rebase the Gopkg.lock file. It got changed with #346

Shawn Hurley added 2 commits July 23, 2018 08:08
this sets up the sdk to expose metrics by registering metrics with
prometheus from an internal package.
@hasbro17 hasbro17 merged commit 367ab7d into operator-framework:master Jul 23, 2018
@shawn-hurley shawn-hurley deleted the metrics branch October 17, 2018 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants