e2e: ci: gh actions: e2e suite scaffolding#19
Conversation
|
fixes: #14 |
dcc0a5d to
ad7fc2d
Compare
|
missing: wait for the cluster to be ready - the init containers which reconfigure containerd will take a nontrivial time to complete. I think I can use the |
0df71e2 to
1b0efff
Compare
7c986c8 to
36f1dba
Compare
make room for the `test` directory to hold e2e test machinery. Signed-off-by: Francesco Romani <fromani@redhat.com>
since we support marhslling, we should support unmarshalling as well for full roundtrippability. Unmarshalling will be used in test code. Signed-off-by: Francesco Romani <fromani@redhat.com>
6f72bef to
85cccb0
Compare
introduce all the scaffolding and hook into the github actions CI to run the e2e suite. Currently there is no test yet to be run. Signed-off-by: Francesco Romani <fromani@redhat.com>
dbbfc40 to
2740865
Compare
In order to run cpu management tests, we need nontrivial hardware discovery, more than the node status reports, and some introspection in pods to learn their CPU assignment. We can implement these using busybox and shell commands, but past attempts doing that (e.g. kubernetes e2e tests) proved awkward, so it should be the last resort; furthermore we already have all the code we need, so we just need to repackage it into a helper image, which is much more handy and a little, fair price to pay. This commit add these entrypoints and the test image. Signed-off-by: Francesco Romani <fromani@redhat.com>
2740865 to
f8f8c47
Compare
|
PR ready for review! works as expected on GH CI, on my local env, and I'm quite confident it should work with few or no hiccups also on prow CI, should we set it up later on. /cc @pravk03 |
|
I realized late that running on kind creates a subtle conflict/interaction because the node resources are shared across all the fake kind nodes. We may need some special accounting and/or disable tests when running on kind vs on nodes with dedicated, unshared resources. But this is future work, doesn't affect the changes posted here. |
| gomega.Expect(err).ToNot(gomega.HaveOccurred(), "cannot create root fixture: %v", err) | ||
| infraFxt := rootFxt.WithPrefix("infra") | ||
| gomega.Expect(infraFxt.Setup(ctx)).To(gomega.Succeed()) | ||
| ginkgo.DeferCleanup(infraFxt.Teardown, context.Background()) // TODO: set a timeout/reuse ctx? |
There was a problem hiding this comment.
nit: Any reason to not use ctx here ?
There was a problem hiding this comment.
The ctx from gingko.It gets canceled when the It callback returns. Cleanup functions should be declared as func(ctx context.Context) and then get a suitable context from Ginkgo.
This probably should be:
| ginkgo.DeferCleanup(infraFxt.Teardown, context.Background()) // TODO: set a timeout/reuse ctx? | |
| ginkgo.DeferCleanup(infraFxt.Teardown) |
There was a problem hiding this comment.
thanks @pohly , I was looking for this reference. Will update accordingly
| gomega.Expect(err).ToNot(gomega.HaveOccurred(), "cannot find worker nodes: %v", err) | ||
| gomega.Expect(workerNodes).ToNot(gomega.BeEmpty(), "no worker nodes detected") | ||
|
|
||
| targetNode = workerNodes[0] // pick random one, this is the simplest random pick |
There was a problem hiding this comment.
nit: Wondering why we need this ?. I am guessing eventually want tester pod to run on specific node ?
There was a problem hiding this comment.
the general reasoning is that the cpu-related tests don't need any specific node if these are all equal, which is the case on kind and in the vast majority of the cases I'm seeing. We can totally add a way to target a specific node, I need to re-upload anyway to fix the previous comment.
In addition: from my experience, the tests need to run serially because they ultimately manage the shared state which is the node state, so there's little advantage in running on more than a single node in parallel.
Should that be needed I reckon it should not be hard to adapt as such.
|
This is Great !. Thank you so much for adding this. /lgtm |
f8f8c47 to
eea7684
Compare
eea7684 to
25ef674
Compare
with all the infrastructure in place, we can now add the first real test. We start with the simplest case: a best effort pod should still get access to all the CPUs. Because we added all the infrastructure and utilities, next tests which will be much easier to add. Signed-off-by: Francesco Romani <fromani@redhat.com>
25ef674 to
3027318
Compare
|
Thanks @pravk03 happy to help! I addressed all the comments and add some initial docs (and guidelines for e2e tests). My plan is that this is just the beginning, I want to add tests to cover the user flows, and then implement feature codes and/or land bugfixes and use the e2e tests to increase the confidence in the correctness. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani, johnbelamaric, pravk03 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
introduce all the scaffolding and hook into the github actions CI to run the e2e suite.
Currently we run only the simplest test to demonstrate the scaffolding and utilities
notes to reviewers
I'm intentionally jumping through hoops to keep a single
install.yamlgeneric source of truth and to transform it as needed to use it in CI. This is the rationale for themake ci-manifestsmachinery.The key reason is for the
kind load docker-imagemachinery to work as expected, theimagePullPolicyhas to beIfNotPresent. Or, alternatively, we can stop using thelatesttag.If we can, or prefer, to just incorporate these changes in
install.yaml, or if we can tolerate an almost-duplicateinstall-ci.yaml, then we can remove the machinery and simplify the PR.