Skip to content

Check devworkspace pod events for "Unrecoverable" events #549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 11, 2021

Conversation

amisevsk
Copy link
Collaborator

What does this PR do?

Checks events related to the DevWorkspace pod for what we would consider "unrecoverable" events (list stolen from Che 😛)

This requires adding an index to the manager, to allow us to list events with the appropriate fieldSelector.

Also adds get/list/watch Events RBAC to the controller.

What issues does this PR fix or reference?

Closes #465

Is it tested? How?

It's hard to reproduce these events manually; the easiest way I found was to patch commonStorage.go to read

// Line 115
	podAdditions.Volumes = append(podAdditions.Volumes, corev1.Volume{
		Name: commonPVCName,
		VolumeSource: corev1.VolumeSource{
			PersistentVolumeClaim: &corev1.PersistentVolumeClaimVolumeSource{
				ClaimName: commonPVCName+"fake",
			},
		},
	})

and trying to start a non-ephemeral DevWorkspace.

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v7-devworkspaces-operator-e2e, v7-devworkspace-happy-path to trigger)
    • v7-devworkspaces-operator-e2e: DevWorkspace e2e test
    • v7-devworkspace-happy-path: DevWorkspace e2e test

Add an indexer to the internal controller cache to track events indexed
by involvedObject.Name. This allows the controller to list events with
the fieldSelector "involvedObject.name=<pod-name>" to query events
specific to a workspace Pod.

Signed-off-by: Angel Misevski <[email protected]>
@amisevsk amisevsk force-pushed the report-failure-to-deploy branch from 8588a62 to 86340c5 Compare August 10, 2021 20:20
"CrashLoopBackOff",
"ImagePullBackOff",
"CreateContainerError",
"RunContainerError",
}

var unrecoverablePodEventReasons = []string{
"FailedMount",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On some infrastructure, FailedMount or FailedScheduling can be an usual situation when you just need to be more patient and then on Che we suggest users to drop them from unrecoverable events list.
We should have it configurable in some way or let users annotate particular workspace, maybe get it from namespaced configmap settings if present.
WDYT?

Copy link
Collaborator Author

@amisevsk amisevsk Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like an annotation system might complicate things too much -- we should just do what we think is best (whether that's check or don't check).

One issue I've seen in Che before is that there can be a bit of a thundering herd problem -- if I've got 10 workspaces waiting for the common PVC to be mountable, it can be very confusing because they'll randomly start up and have to be stopped until it gets to the one I want to actually use.

One option that just occurred to me from looking at the Event struct a bit more -- what if we have a threshold for "number of times the event has occurred"? We could fail workspace start if e.g. we've seen the FailedMount even three times.

edit: tested checking the count, and for e.g. a non-existent PVC, it never goes past 1 :/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #550 to follow up

@openshift-ci
Copy link

openshift-ci bot commented Aug 11, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk, JPinkney, sleshchenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JPinkney,amisevsk,sleshchenko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link

openshift-ci bot commented Aug 11, 2021

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm label Aug 11, 2021
Add functionality to check pod events during reconciles, in order to
detect unrecoverable states in workspace startup that aren't reflected
in the pod's status.

Signed-off-by: Angel Misevski <[email protected]>
@amisevsk amisevsk force-pushed the report-failure-to-deploy branch from 7f265b1 to 05f102f Compare August 11, 2021 16:01
@amisevsk
Copy link
Collaborator Author

/test v7-devworkspaces-operator-e2e, v7-devworkspace-happy-path

@openshift-ci
Copy link

openshift-ci bot commented Aug 11, 2021

@amisevsk: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/v7-devworkspace-happy-path 05f102f link /test v7-devworkspace-happy-path

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@amisevsk
Copy link
Collaborator Author

Happy path tests fail due to #484 (Events not in RBAC used in the test itself).

I'm merging this one now, with the note that we should follow up in #550

@amisevsk amisevsk merged commit 3f310df into devfile:main Aug 11, 2021
@amisevsk amisevsk deleted the report-failure-to-deploy branch August 11, 2021 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DevWorkspace Operator should report deployment issues
3 participants