Skip to content

Add DevWorkspace-specific metrics #500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 30, 2021

Conversation

amisevsk
Copy link
Collaborator

@amisevsk amisevsk commented Jul 21, 2021

What does this PR do?

Adds DevWorkspace-specific metrics to the metrics server, tracking

  1. Number of workspace starts
  2. Number of successfully started workspaces (i.e. workspaces entering 'Running' phase)
  3. Number of devworkspaces that enter the Failed state
  4. A histogram of workspace startup time, bucketed in 10-second intervals

Currently, the metrics come with a source label. The value of this label is derived from the value of the controller.devfile.io/devworkspace-source label on the DevWorkspace (optional). If set, this allows metrics to be partitioned based on the tool creating the DevWorkspace (e.g. Web Terminal Operator, Eclipse Che, etc.)

Metrics are updated on phase changes in the DevWorkspace (i.e. going from phase 'Started' -> 'Running') to avoid double counting DevWorkspaces.

What issues does this PR fix or reference?

Additional nice-to-haves for #241

Is it tested? How?

Changes can be tested in same way as #405 (note metrics won't appear until they're updated at least once)

A simpler way to test these changes directly is by running the controller locally:

  1. make run the controller in one window
  2. kubectl apply -f samples/theia-next.yaml
  3. curl -s http://localhost:8080/metrics | grep '^dev'

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v7-devworkspaces-operator-e2e, v7-devworkspace-happy-path to trigger)
    • v7-devworkspaces-operator-e2e: DevWorkspace e2e test
    • v7-devworkspace-happy-path: DevWorkspace e2e test

@@ -120,6 +121,7 @@ func (r *DevWorkspaceReconciler) Reconcile(req ctrl.Request) (reconcileResult ct
workspace.Status.Phase = dw.DevWorkspaceStatusStarting
workspace.Status.Message = "Initializing DevWorkspace"
err = r.Status().Update(ctx, workspace)
metrics.WorkspaceStarted(workspace, reqLogger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not we check error before incrementing Started?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it increment Started when creating devworkspace with Started: false.

I'm afraid I don't understand why we even set Starting phase here, I would say we just should update status with DevWorkspace, and STOPPED.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I don't know what I was thinking doing it this way. I'll fix this before merging, but I got side-tracked with debugging #527 today.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed, though there's still a bit of strangeness in having to set the Started condition ASAP and then also having to do

	reconcileStatus := currentStatus{phase: dw.DevWorkspaceStatusStarting}
	reconcileStatus.setConditionTrue(conditions.Started, "DevWorkspace is starting")

to ensure the condition isn't wiped out later.

@@ -62,6 +63,7 @@ var healthHttpClient = &http.Client{
// Parameters for result and error are returned unmodified, unless error is nil and another error is encountered while
// updating the status.
func (r *DevWorkspaceReconciler) updateWorkspaceStatus(workspace *dw.DevWorkspace, logger logr.Logger, status *currentStatus, reconcileResult reconcile.Result, reconcileError error) (reconcile.Result, error) {
updateMetricsForPhase(workspace, status.phase, logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume metrics will be more precise if we move after check that status update error is nil

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to only update metrics if the corresponding status update succeeds.


func incrementStartTimeBucketForWorkspace(wksp *dw.DevWorkspace, log logr.Logger) {
sourceLabel := wksp.Labels[workspaceSourceLabel]
hist, err := workspaceStartupTimesHist.GetMetricWith(map[string]string{metricSourceLabel: sourceLabel})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we would need to create PR against OpenShift Console, and probably backport it to OS 4.8, so also update existing CR is Console if label is not detected.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (finally): openshift/console#9752

amisevsk added 6 commits July 29, 2021 13:28
Add metrics to count
* Workspaces started
* Workspaces that successfully entered the Running phase
* Workspaces that failed to start

Metrics are labelled according to the value of the
'controller.devfile.io/devworkspace-source' label on the DevWorkspace,
allowing metrics to be grouped based on the tool creating DevWorkspaces
(e.g. Web Terminal Operator, Eclipse Che, etc.)

Signed-off-by: Angel Misevski <[email protected]>
* Add 'WorkspaceStarted' condition to mark when a workspace enters the
  'Starting' phase
* Move custom condition definitions to a separate package to avoid
  circular dependencies when they're used in another package.

Signed-off-by: Angel Misevski <[email protected]>
* Move setting "Started" condition to the point in the reconcile loop
  where we know the workspace is started to avoid marking new stopped
  workspaces as started

* Set "Started" condition to false when a workspace is stopped

* Only update metrics if the API call to sync workspace status succeeds
  to avoid accidentally incrementing metrics when a request fails

Signed-off-by: Angel Misevski <[email protected]>
@amisevsk
Copy link
Collaborator Author

/test v7-devworkspaces-operator-e2e, v7-devworkspace-happy-path

@openshift-ci
Copy link

openshift-ci bot commented Jul 30, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk, JPinkney, sleshchenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JPinkney,amisevsk,sleshchenko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sleshchenko sleshchenko merged commit b9659c8 into devfile:main Jul 30, 2021
@amisevsk amisevsk deleted the devworkspace-metrics branch July 30, 2021 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants