Skip to content

e2e: ci: gh actions: e2e suite scaffolding#19

Merged
k8s-ci-robot merged 5 commits into
kubernetes-sigs:mainfrom
ffromani:ci-e2e-tests
Sep 8, 2025
Merged

e2e: ci: gh actions: e2e suite scaffolding#19
k8s-ci-robot merged 5 commits into
kubernetes-sigs:mainfrom
ffromani:ci-e2e-tests

Conversation

@ffromani
Copy link
Copy Markdown
Contributor

@ffromani ffromani commented Sep 2, 2025

introduce all the scaffolding and hook into the github actions CI to run the e2e suite.

Currently we run only the simplest test to demonstrate the scaffolding and utilities

notes to reviewers
I'm intentionally jumping through hoops to keep a single install.yaml generic source of truth and to transform it as needed to use it in CI. This is the rationale for the make ci-manifests machinery.
The key reason is for the kind load docker-image machinery to work as expected, the imagePullPolicy has to be IfNotPresent. Or, alternatively, we can stop using the latest tag.
If we can, or prefer, to just incorporate these changes in install.yaml, or if we can tolerate an almost-duplicate install-ci.yaml, then we can remove the machinery and simplify the PR.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 2, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 2, 2025
@ffromani
Copy link
Copy Markdown
Contributor Author

ffromani commented Sep 2, 2025

fixes: #14

@ffromani ffromani force-pushed the ci-e2e-tests branch 2 times, most recently from dcc0a5d to ad7fc2d Compare September 2, 2025 18:13
@ffromani
Copy link
Copy Markdown
Contributor Author

ffromani commented Sep 2, 2025

missing: wait for the cluster to be ready - the init containers which reconfigure containerd will take a nontrivial time to complete. I think I can use the resourceslice availability as proxy. Once that is done, the scaffolding is completed. Next commits (or PRs?) will start fleshing out the test suite.

@ffromani ffromani force-pushed the ci-e2e-tests branch 7 times, most recently from 0df71e2 to 1b0efff Compare September 3, 2025 16:21
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 3, 2025
@ffromani ffromani force-pushed the ci-e2e-tests branch 2 times, most recently from 7c986c8 to 36f1dba Compare September 4, 2025 13:06
make room for the `test` directory to hold e2e test
machinery.

Signed-off-by: Francesco Romani <fromani@redhat.com>
since we support marhslling, we should support
unmarshalling as well for full roundtrippability.
Unmarshalling will be used in test code.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the ci-e2e-tests branch 3 times, most recently from 6f72bef to 85cccb0 Compare September 4, 2025 13:38
introduce all the scaffolding and hook into the github actions
CI to run the e2e suite.

Currently there is no test yet to be run.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the ci-e2e-tests branch 3 times, most recently from dbbfc40 to 2740865 Compare September 5, 2025 06:31
In order to run cpu management tests, we need nontrivial
hardware discovery, more than the node status reports,
and some introspection in pods to learn their CPU assignment.

We can implement these using busybox and shell commands,
but past attempts doing that (e.g. kubernetes e2e tests)
proved awkward, so it should be the last resort;
furthermore we already have all the code we need, so
we just need to repackage it into a helper image,
which is much more handy and a little, fair price to pay.

This commit add these entrypoints and the test image.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani changed the title WIP: e2e: ci: gh actions: e2e suite scaffolding e2e: ci: gh actions: e2e suite scaffolding Sep 5, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 5, 2025
@ffromani
Copy link
Copy Markdown
Contributor Author

ffromani commented Sep 5, 2025

PR ready for review! works as expected on GH CI, on my local env, and I'm quite confident it should work with few or no hiccups also on prow CI, should we set it up later on.

/cc @pravk03

@ffromani
Copy link
Copy Markdown
Contributor Author

ffromani commented Sep 6, 2025

I realized late that running on kind creates a subtle conflict/interaction because the node resources are shared across all the fake kind nodes. We may need some special accounting and/or disable tests when running on kind vs on nodes with dedicated, unshared resources. But this is future work, doesn't affect the changes posted here.

Comment thread test/e2e/cpu_assignment_test.go Outdated
gomega.Expect(err).ToNot(gomega.HaveOccurred(), "cannot create root fixture: %v", err)
infraFxt := rootFxt.WithPrefix("infra")
gomega.Expect(infraFxt.Setup(ctx)).To(gomega.Succeed())
ginkgo.DeferCleanup(infraFxt.Teardown, context.Background()) // TODO: set a timeout/reuse ctx?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Any reason to not use ctx here ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctx from gingko.It gets canceled when the It callback returns. Cleanup functions should be declared as func(ctx context.Context) and then get a suitable context from Ginkgo.

This probably should be:

Suggested change
ginkgo.DeferCleanup(infraFxt.Teardown, context.Background()) // TODO: set a timeout/reuse ctx?
ginkgo.DeferCleanup(infraFxt.Teardown)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @pohly , I was looking for this reference. Will update accordingly

Comment thread test/e2e/cpu_assignment_test.go Outdated
gomega.Expect(err).ToNot(gomega.HaveOccurred(), "cannot find worker nodes: %v", err)
gomega.Expect(workerNodes).ToNot(gomega.BeEmpty(), "no worker nodes detected")

targetNode = workerNodes[0] // pick random one, this is the simplest random pick
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Wondering why we need this ?. I am guessing eventually want tester pod to run on specific node ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the general reasoning is that the cpu-related tests don't need any specific node if these are all equal, which is the case on kind and in the vast majority of the cases I'm seeing. We can totally add a way to target a specific node, I need to re-upload anyway to fix the previous comment.

In addition: from my experience, the tests need to run serially because they ultimately manage the shared state which is the node state, so there's little advantage in running on more than a single node in parallel.
Should that be needed I reckon it should not be hard to adapt as such.

@pravk03
Copy link
Copy Markdown
Contributor

pravk03 commented Sep 7, 2025

This is Great !. Thank you so much for adding this.
I'm still learning the Ginkgo framework, so I don't have many comments. I tried running the test locally and it worked.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2025
with all the infrastructure in place, we can now add the first
real test. We start with the simplest case: a best effort pod
should still get access to all the CPUs.

Because we added all the infrastructure and utilities, next tests
which will be much easier to add.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani
Copy link
Copy Markdown
Contributor Author

ffromani commented Sep 8, 2025

Thanks @pravk03 happy to help! I addressed all the comments and add some initial docs (and guidelines for e2e tests). My plan is that this is just the beginning, I want to add tests to cover the user flows, and then implement feature codes and/or land bugfixes and use the e2e tests to increase the confidence in the correctness.

@johnbelamaric
Copy link
Copy Markdown

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, johnbelamaric, pravk03

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 8, 2025
@k8s-ci-robot k8s-ci-robot merged commit 4769fd6 into kubernetes-sigs:main Sep 8, 2025
7 checks passed
@ffromani ffromani deleted the ci-e2e-tests branch September 11, 2025 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants