e2e: ci: gh actions: e2e suite scaffolding by ffromani · Pull Request #19 · kubernetes-sigs/dra-driver-cpu

ffromani · 2025-09-02T17:55:53Z

introduce all the scaffolding and hook into the github actions CI to run the e2e suite.

Currently we run only the simplest test to demonstrate the scaffolding and utilities

notes to reviewers
I'm intentionally jumping through hoops to keep a single install.yaml generic source of truth and to transform it as needed to use it in CI. This is the rationale for the make ci-manifests machinery.
The key reason is for the kind load docker-image machinery to work as expected, the imagePullPolicy has to be IfNotPresent. Or, alternatively, we can stop using the latest tag.
If we can, or prefer, to just incorporate these changes in install.yaml, or if we can tolerate an almost-duplicate install-ci.yaml, then we can remove the machinery and simplify the PR.

ffromani · 2025-09-02T17:56:58Z

fixes: #14

ffromani · 2025-09-02T18:17:43Z

missing: wait for the cluster to be ready - the init containers which reconfigure containerd will take a nontrivial time to complete. I think I can use the resourceslice availability as proxy. Once that is done, the scaffolding is completed. Next commits (or PRs?) will start fleshing out the test suite.

make room for the `test` directory to hold e2e test machinery. Signed-off-by: Francesco Romani <fromani@redhat.com>

since we support marhslling, we should support unmarshalling as well for full roundtrippability. Unmarshalling will be used in test code. Signed-off-by: Francesco Romani <fromani@redhat.com>

introduce all the scaffolding and hook into the github actions CI to run the e2e suite. Currently there is no test yet to be run. Signed-off-by: Francesco Romani <fromani@redhat.com>

In order to run cpu management tests, we need nontrivial hardware discovery, more than the node status reports, and some introspection in pods to learn their CPU assignment. We can implement these using busybox and shell commands, but past attempts doing that (e.g. kubernetes e2e tests) proved awkward, so it should be the last resort; furthermore we already have all the code we need, so we just need to repackage it into a helper image, which is much more handy and a little, fair price to pay. This commit add these entrypoints and the test image. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani · 2025-09-05T08:22:18Z

PR ready for review! works as expected on GH CI, on my local env, and I'm quite confident it should work with few or no hiccups also on prow CI, should we set it up later on.

/cc @pravk03

ffromani · 2025-09-06T08:07:24Z

I realized late that running on kind creates a subtle conflict/interaction because the node resources are shared across all the fake kind nodes. We may need some special accounting and/or disable tests when running on kind vs on nodes with dedicated, unshared resources. But this is future work, doesn't affect the changes posted here.

pravk03 · 2025-09-07T16:42:25Z

+		gomega.Expect(err).ToNot(gomega.HaveOccurred(), "cannot create root fixture: %v", err)
+		infraFxt := rootFxt.WithPrefix("infra")
+		gomega.Expect(infraFxt.Setup(ctx)).To(gomega.Succeed())
+		ginkgo.DeferCleanup(infraFxt.Teardown, context.Background()) // TODO: set a timeout/reuse ctx?


nit: Any reason to not use ctx here ?

The ctx from gingko.It gets canceled when the It callback returns. Cleanup functions should be declared as func(ctx context.Context) and then get a suitable context from Ginkgo.

This probably should be:

Suggested change

ginkgo.DeferCleanup(infraFxt.Teardown, context.Background()) // TODO: set a timeout/reuse ctx?

ginkgo.DeferCleanup(infraFxt.Teardown)

thanks @pohly , I was looking for this reference. Will update accordingly

pravk03 · 2025-09-07T16:54:03Z

+		gomega.Expect(err).ToNot(gomega.HaveOccurred(), "cannot find worker nodes: %v", err)
+		gomega.Expect(workerNodes).ToNot(gomega.BeEmpty(), "no worker nodes detected")
+
+		targetNode = workerNodes[0] // pick random one, this is the simplest random pick


nit: Wondering why we need this ?. I am guessing eventually want tester pod to run on specific node ?

the general reasoning is that the cpu-related tests don't need any specific node if these are all equal, which is the case on kind and in the vast majority of the cases I'm seeing. We can totally add a way to target a specific node, I need to re-upload anyway to fix the previous comment.

In addition: from my experience, the tests need to run serially because they ultimately manage the shared state which is the node state, so there's little advantage in running on more than a single node in parallel.
Should that be needed I reckon it should not be hard to adapt as such.

pravk03 · 2025-09-07T17:14:22Z

This is Great !. Thank you so much for adding this.
I'm still learning the Ginkgo framework, so I don't have many comments. I tried running the test locally and it worked.

/lgtm

with all the infrastructure in place, we can now add the first real test. We start with the simplest case: a best effort pod should still get access to all the CPUs. Because we added all the infrastructure and utilities, next tests which will be much easier to add. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani · 2025-09-08T07:54:01Z

Thanks @pravk03 happy to help! I addressed all the comments and add some initial docs (and guidelines for e2e tests). My plan is that this is just the beginning, I want to add tests to cover the user flows, and then implement feature codes and/or land bugfixes and use the e2e tests to increase the confidence in the correctness.

johnbelamaric · 2025-09-08T21:52:45Z

/approve
/lgtm

k8s-ci-robot · 2025-09-08T21:52:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, johnbelamaric, pravk03

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [johnbelamaric]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 2, 2025

k8s-ci-robot requested review from johnbelamaric and pohly September 2, 2025 17:55

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 2, 2025

ffromani force-pushed the ci-e2e-tests branch 2 times, most recently from dcc0a5d to ad7fc2d Compare September 2, 2025 18:13

ffromani force-pushed the ci-e2e-tests branch 7 times, most recently from 0df71e2 to 1b0efff Compare September 3, 2025 16:21

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 3, 2025

ffromani force-pushed the ci-e2e-tests branch 2 times, most recently from 7c986c8 to 36f1dba Compare September 4, 2025 13:06

ffromani added 2 commits September 4, 2025 15:17

project: rename 'test' to 'test-unit'

278bf34

make room for the `test` directory to hold e2e test machinery. Signed-off-by: Francesco Romani <fromani@redhat.com>

cpuinfo: implement coretype UnmarshalJSON

0205327

since we support marhslling, we should support unmarshalling as well for full roundtrippability. Unmarshalling will be used in test code. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the ci-e2e-tests branch 3 times, most recently from 6f72bef to 85cccb0 Compare September 4, 2025 13:38

e2e: ci: gh actions: e2e suite scaffolding

25ea50b

introduce all the scaffolding and hook into the github actions CI to run the e2e suite. Currently there is no test yet to be run. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the ci-e2e-tests branch 3 times, most recently from dbbfc40 to 2740865 Compare September 5, 2025 06:31

ffromani force-pushed the ci-e2e-tests branch from 2740865 to f8f8c47 Compare September 5, 2025 08:16

ffromani changed the title ~~WIP: e2e: ci: gh actions: e2e suite scaffolding~~ e2e: ci: gh actions: e2e suite scaffolding Sep 5, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 5, 2025

k8s-ci-robot requested a review from pravk03 September 5, 2025 08:22

ffromani mentioned this pull request Sep 5, 2025

feat: Add support for excluding certain CPUs from ResourceSlice #20

Merged

pravk03 approved these changes Sep 7, 2025

View reviewed changes

k8s-ci-robot assigned pravk03 Sep 7, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2025

ffromani force-pushed the ci-e2e-tests branch from f8f8c47 to eea7684 Compare September 8, 2025 07:12

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2025

ffromani force-pushed the ci-e2e-tests branch from eea7684 to 25ef674 Compare September 8, 2025 07:20

ffromani force-pushed the ci-e2e-tests branch from 25ef674 to 3027318 Compare September 8, 2025 07:25

k8s-ci-robot assigned johnbelamaric Sep 8, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 8, 2025

k8s-ci-robot merged commit 4769fd6 into kubernetes-sigs:main Sep 8, 2025
7 checks passed

ffromani deleted the ci-e2e-tests branch September 11, 2025 12:11

This was referenced Sep 20, 2025

Add E2E/integration test suite #14

Closed

Release guaranteed CPUs as soon as possible back into the shared pool #15

Closed

	ginkgo.DeferCleanup(infraFxt.Teardown, context.Background()) // TODO: set a timeout/reuse ctx?
	ginkgo.DeferCleanup(infraFxt.Teardown)

Conversation

ffromani commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ffromani commented Sep 2, 2025

Uh oh!

ffromani commented Sep 2, 2025

Uh oh!

ffromani commented Sep 5, 2025

Uh oh!

ffromani commented Sep 6, 2025

Uh oh!

pravk03 Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

pohly Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

ffromani Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

pravk03 Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

ffromani Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

pravk03 commented Sep 7, 2025

Uh oh!

ffromani commented Sep 8, 2025

Uh oh!

johnbelamaric commented Sep 8, 2025

Uh oh!

k8s-ci-robot commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ffromani commented Sep 2, 2025 •

edited

Loading