Skip to content

Migrate to devfile/api v1alpha2 and add conversion webhooks. #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Nov 27, 2020

Conversation

amisevsk
Copy link
Collaborator

What does this PR do?

Switches the devworkspace-operator to use v1alpha2 devworkspaces, and implements conversion webhooks. Currently this is done by updating go.mod to point at devfile/api#213, as that PR is not yet merged.

To avoid too many temporary changes in the repo (changing where devworkspace CRDs are downloaded from), you'll have to manually copy in the CRDs generated in devfile/api#213 to config/crd/bases/:

curl "https://raw.githubusercontent.com/devfile/api/fc53138e/crds/workspace.devfile.io_devworkspaces.yaml" \
    -o config/crd/bases/workspace.devfile.io_devworkspaces.yaml

curl "https://raw.githubusercontent.com/devfile/api/fc53138e/crds/workspace.devfile.io_devworkspacetemplates.yaml" \
    -o config/crd/bases/workspace.devfile.io_devworkspacetemplates.yaml 

What issues does this PR fix or reference?

Resolves #189

Is it tested? How?

To test this PR (on minikube, for now):
0. Set env vars as usual for development

  1. Deploy cert-manager on your cluster: make install_cert_manager
  2. Deploy controller: make docker install
  3. Verify that conversion webhooks are created: log message conversion webhook enabled appears in kubectl logs -f deploy/devworkspace-webhook-server
  4. Create a DevWorkspace: kubectl apply -f samples/theia.yaml
  5. Check that you can get both v1alpha1 and v1alpha2 versions of the devworkspace:
    kubectl get devworkspaces.v1alpha1.workspace.devfile.io theia -o yaml
    kubectl get devworkspaces.v1alpha2.workspace.devfile.io theia -o yaml
    
  6. Test usability as normal

Remaining TODOs

  1. I haven't tested/verified that local run/debug still works as expected, but will do that soon.
  2. These changes make webhooks mandatory; we likely need to update how WEBHOOKS_ENABLED works, perhaps disabling the mutating and validating webhooks when set to false but still deploying the webhooks server
  3. The service name for the webhooks server is hard-coded, so it has to be hard-coded into templates as well, we should probably make this configurable
  4. Only k8s with cert-manager is supported at this point. The next step is to add support for OpenShift via the Service CA operator, which supports a similar CRD annotation. We'll need to verify that the certificates secret is correctly propagated to the webhooks server
  5. This PR includes a workaround for deploying CRDs when they're already present on the cluster (we first remove them). I've found that on minikube, without those changes, the second make install fails. You can test this by
    kustomize build config/crd | kubectl apply -f -
    kustomize build config/crd | kubectl apply -f -
    or by commenting out the delete command in make install.
  6. This PR updates existing CRDs for Components and WorkspaceRoutings. In terms of yaml spec they are identical, except the new version has defaults for some endpoint fields. We need to verify that
    • Applying a v1 version of the CRD is not an error when a `v1beta1 CRD with the same GVK is present
    • That upgrading from a previous version will not cause issues for existing devworkspaces.

Re-add DevWorkspaceTemplates to CRDs and adjust scripts/gitignore to
accomodate them.

Signed-off-by: Angel Misevski <[email protected]>
Copy link
Contributor

@JPinkney JPinkney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments, my CRC was being a pain so I'll have to look further in depth on monday. I should have read the description better :)

if err != nil {
log.Error(err, "Failed to create webhooks")
os.Exit(1)
}

if err := ctrl.NewWebhookManagedBy(mgr).For(&workspacev1alpha1.DevWorkspace{}).Complete(); err != nil {
log.Error(err, "failed creating conversion webhook")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, if the conversion webhooks fail to create should we exit or just continue without them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe if the webhooks fail to set up you won't be able to create DevWorkspaces -- when you create a DevWorkspace it calls the conversion webhooks (at least for the v1alpha1 case) and if it can't get a response it blocks creation.

Also surprising info about how conversion works: If you specify conversion: none in the CRD, you can "get" a v1alpha2-spec workspace that is just the v1alpha1 and a wrong GVK 🤷

@JPinkney
Copy link
Contributor

I just tried the testing instructions and everything was working as expected. For some reason my make _init_devworkspace_crds wasn't working but I think that was more of a me issue.

I tried the local run/debug and I get:

2020-11-23T15:49:09.458-0500    ERROR   setup   failed to setup webhooks        {"error": "failed to create webhooks deployment: environment variable WEBHOOK_SECRET_NAME is unset"}

amisevsk and others added 23 commits November 24, 2020 00:04
Update dependency on devfile/api to commit 1103496. This is required as
this commit contains both v1alpha1 and v1alpha2 APIs as well as
conversion code.

Signed-off-by: Angel Misevski <[email protected]>
* Configure `make manifests` to generate v1 CRDs (instead of v1beta1) to
resolve a validation issue (v1beta CRDs cannot define default fields)
* Fix patch_crds.sh to patch v1 CRDs instead of v1beta1
* Add rule for restarting the webhooks server
* Add rule for installing cert-manager to a cluster

Signed-off-by: Angel Misevski <[email protected]>
Update all code to use v1alpha2 instead of v1alpha1. In general this is
a straightforward switching of imports, but some changes are required:

* Change in devfile structure (move keys up a level) means that some
vistior-based code no longer works in the same way (visitor functions
cannot get key), so they're reworked into a switch statement
* Some methods can now return an error.
* Devfile API v1alpha2 removes a duplicated memoryLimit feild from
containers, and other minor changes for compatibility.

Signed-off-by: Angel Misevski <[email protected]>
Clean up unused/autogenerated code from config/ directory. A lot of it
won't work for our use case, and some of it is out of date (e.g. v1beta1
cert-manager certificates when v1 is available)

Signed-off-by: Angel Misevski <[email protected]>
Update separate webhooks server to create conversion webhooks. This
means we can no longer generate a certificate at runtime, and must
instead use a certificate that is included in the DevWorkspace CRD
specs.

Signed-off-by: Angel Misevski <[email protected]>
Attempting to apply CRDs when they already exist on the cluster hangs
indefinitely for unclear reasons.

Signed-off-by: Angel Misevski <[email protected]>
v1alpha2 introduces default values for Endpoint.Exposure and
Endpoint.Protocol. When left empty, they are populated by the API server
which causes spec vs cluster comparisons to always fail.

Signed-off-by: Angel Misevski <[email protected]>
Fix issue when running locally, where the operator expects environment
variable WEBHOOK_SECRET_NAME to be set according to the secret generated
by cert-manager.

For now, we export a hard-coded secret name in the Makefile when trying
to run/debug locally.

Signed-off-by: Angel Misevski <[email protected]>
Reflecting devfile/api PR devfile/api#214

Cherry-picked from commit cdef6dc and
adapted slightly

Signed-off-by: David Festal <[email protected]>
Signed-off-by: Angel Misevski <[email protected]>
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk, JPinkney, sleshchenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JPinkney,amisevsk,sleshchenko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk, JPinkney, sleshchenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JPinkney,amisevsk,sleshchenko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Move config/components/certificates/cert-manager to
  /config/components/cert-manager since there's no other
  files required for deploying certs
* Move common labels to last-step kustomizations
* Remove namespace from base as it cannot be changed
* Add plantuml diagram documenting structure of kustomize
  directories

Signed-off-by: Angel Misevski <[email protected]>
@openshift-ci-robot
Copy link
Collaborator

New changes are detected. LGTM label has been removed.

@amisevsk
Copy link
Collaborator Author

Pushed a couple of last-minute fixups:

  • Removed bundle-related makefile rules, since they don't work with our current structure; we'll have to address this later. Operator SDK-generated manifests expect the default folder structure, which we don't/cannot use and don't include devworkspace CRDs
  • Reorganized the config/ directory a bit:
    • Moved config/components/certificates/cert-manager up to config/components/cert-manager, since I initially thought we might need other types of deployments for certificates (e.g. something for Service CA). Since there's only one subfolder it makes sense to simplify the structure for now
    • Removed namespace from config/base as it was hard-coded to devworkspace-operator
    • Removed common labels from config/base as they should be applied at the end (otherwise anything not used as a base in config/base will not get the labels
  • Added a plantuml diagram of the current structure of the config/ directory with some explanation of what each kustomize.yaml does:

Kustomize deployment structure

@amisevsk
Copy link
Collaborator Author

/test v5-devworkspaces-operator-e2e

@amisevsk
Copy link
Collaborator Author

/test v5-devworkspaces-operator-e2e

@amisevsk
Copy link
Collaborator Author

/retest

@@ -64,13 +64,3 @@ func (w *Deployment) DeployWorkspacesController() error {

return nil
}

func (w *Deployment) CustomResourceDefinitions() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really removed from Makefile https://github.com/amisevsk/devworkspace-operator/blob/add-conversion-v1alpha2/Makefile ?
Or it just does not make sense to execute it separately since e2e tests do make install, which includes crds installing as well?

@sleshchenko
Copy link
Member

test is failing because it can't initialize CRDs:

Downloading devfile/api CRDs to /tmp/tmp.9VboUUlNdl
Initialized empty Git repository in /tmp/tmp.9VboUUlNdl/.git/
aeda60d4361911da85103f224644bfa792498499
DevWorkspace API is specified from branch
fatal: reference is not a tree: aeda60d4361911da85103f224644bfa792498499
make[1]: *** [_init_devworkspace_crds] Error 128

@sleshchenko
Copy link
Member

/test v5-devworkspaces-operator-e2e

1 similar comment
@sleshchenko
Copy link
Member

/test v5-devworkspaces-operator-e2e

@sleshchenko sleshchenko force-pushed the add-conversion-v1alpha2 branch from 941d553 to e7e788d Compare November 27, 2020 12:49
@sleshchenko
Copy link
Member

sleshchenko commented Nov 27, 2020

Something strange happens with tests. I think the PR can be merged without this PR check if make test_e2e works locally

@sleshchenko
Copy link
Member

/test v5-devworkspaces-operator-e2e

@sleshchenko
Copy link
Member

Found a bug which may cause tests failure on CI, at least it cause my local failure, controller tries to update components forever because of difference in sourceMapping

See diff from debug logs
{
  "wanted": {
    "spec": {
      "workspaceId": "workspace267da45101b44759",
      "components": [
        {
          "name": "dev",
          "container": {
            "image": "quay.io/wto/web-terminal-tooling:latest",
            "env": [
              {
                "name": "PS1",
                "value": "\\[\\e[34m\\]>\\[\\e[m\\]\\[\\e[33m\\]>\\[\\e[m\\]"
              }
            ],
            "memoryLimit": "256Mi",
            "args": [
              "tail",
              "-f",
              "/dev/null"
            ]
          }
        }
      ]
    }
  },
  "clusterComponent": {
    "spec": {
      "workspaceId": "workspace267da45101b44759",
      "components": [
        {
          "name": "dev",
          "container": {
            "image": "quay.io/wto/web-terminal-tooling:latest",
            "env": [
              {
                "name": "PS1",
                "value": "\\[\\e[34m\\]>\\[\\e[m\\]\\[\\e[33m\\]>\\[\\e[m\\]"
              }
            ],
            "memoryLimit": "256Mi",
            "args": [
              "tail",
              "-f",
              "/dev/null"
            ],
            "sourceMapping": "/projects"
          }
        }
      ]
    }
  }
}

I assume component CRD needs to be updated.

@sleshchenko
Copy link
Member

/test v5-devworkspaces-operator-e2e

Running an older container (centos:7) causes the
update_devworkspace_crds.sh script to fail with error:

 fatal: reference is not a tree: aeda60d4361911da851

This appears to be due to different behavior for git-fetch in older
versions (centos 7 uses 1.8.3.1).

Signed-off-by: Angel Misevski <[email protected]>
@amisevsk
Copy link
Collaborator Author

/test v5-devworkspaces-operator-e2e

@amisevsk
Copy link
Collaborator Author

@sleshchenko I tested again on crc and was able to start up a cloud-shell workspace -- no idea why it's failing in CI now. I'll look into the souremapping issue, it may be causing a problem but it's not clear why.

The devfile/api ContainerComponent spec sets default values, e.g. for
sourceMapping. Once a resource is created on the cluster, these defaults
will be applied, and so defaulted fields must have a defined value in
the static spec we use to create resources -- otherwise the controller
should detect a difference and attempt to reconcile it away.

Signed-off-by: Angel Misevski <[email protected]>
@amisevsk
Copy link
Collaborator Author

/test v5-devworkspaces-operator-e2e

@amisevsk amisevsk merged commit b2743e3 into devfile:master Nov 27, 2020
@amisevsk
Copy link
Collaborator Author

Merging to unblock all progress on the operator.

@amisevsk amisevsk deleted the add-conversion-v1alpha2 branch February 8, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement conversion webhooks for devfile/api versions
5 participants