-
Notifications
You must be signed in to change notification settings - Fork 18
CNF-18275: CNF-18276: e2e: serial: more cache stall tests #1378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@ffromani: No Jira issue with key CNF-18725 exists in the tracker at https://issues.redhat.com/. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@ffromani: This pull request references CNF-18275 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@ffromani: This pull request references CNF-18275 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
in `internal/wait` we have a couple functions with inconsistent signature: - ForPodListAllRunning - ForPodListAllDeleted the former takes and return slices of pointers to `v1.Pod`; the latter takes a slice of `v1.Pod`. This was likely an implementation oversight. The problem however run deeper: what's the correct signature? Let's take a step back. `internal/podlist` builds upon `sigs.k8s.io`'s controller-runtime, which uses` PodList to hold `Items`, which is a slice of `v1.Pod`. There is no reason to differ here, so for us a `podlist` should be akin to a slice of `v1.Pod`. Thus, functions which operate on a slice of pointers to `v1.Pod` should use a different name, like maybe `Pods`. Going back to the original question, and considering all the above, the most correct API seems to be the one using pointers. `ForPodListAllDeleted` should thus be updated to ensure the return values of `ForPodListAllRunning` can be used directly. Finally, both functions should be renamed to move from PodList to Pods. Signed-off-by: Francesco Romani <[email protected]>
9bc3492
to
0825ac0
Compare
/hold let's wait for the fix for https://issues.redhat.com/browse/OCPBUGS-56785 to be available before to move forward |
A common ginkgo gotcha is to initialize variables on var definition. We should never do that. Always initialize in a BeforeEach block. Signed-off-by: Francesco Romani <[email protected]>
Pod event checker should be a klog message, not a By message, to reduce noise in the runs. Signed-off-by: Francesco Romani <[email protected]>
remove conflicting redundant labels. The labels should be moved in the highest meaningful level. Signed-off-by: Francesco Romani <[email protected]>
Add more stall tests using targeted naked (not using jobs) pods. Signed-off-by: Francesco Romani <[email protected]>
clarify logs for a better troubleshoot experience. No intended changes in behavior. Signed-off-by: Francesco Romani <[email protected]>
0825ac0
to
4a6a874
Compare
Entry("with non-restartable pods with guaranteed pod with partially terminated app containers", context.TODO(), func(podNamespace string, idx int) *corev1.Pod { | ||
return &corev1.Pod{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Namespace: podNamespace, | ||
GenerateName: "generic-pause-pod-", | ||
Labels: map[string]string{ | ||
"test-pod-index": strconv.FormatInt(int64(idx), 10), | ||
}, | ||
}, | ||
Spec: corev1.PodSpec{ | ||
Containers: []corev1.Container{ | ||
{ | ||
Name: "generic-main-cnt-run-1", | ||
Image: images.GetPauseImage(), | ||
Command: []string{"/bin/sleep"}, | ||
Args: []string{"42d"}, // "forever" in our test timescale | ||
Resources: corev1.ResourceRequirements{ | ||
Limits: corev1.ResourceList{ | ||
corev1.ResourceCPU: resource.MustParse("100m"), | ||
corev1.ResourceMemory: resource.MustParse("256Mi"), | ||
}, | ||
}, | ||
}, | ||
{ | ||
Name: "generic-main-cnt-run-2", | ||
Image: images.GetPauseImage(), | ||
Command: []string{"/bin/sleep"}, | ||
Args: []string{"1s"}, | ||
Resources: corev1.ResourceRequirements{ | ||
Limits: corev1.ResourceList{ | ||
corev1.ResourceCPU: resource.MustParse("100m"), | ||
corev1.ResourceMemory: resource.MustParse("256Mi"), | ||
}, | ||
}, | ||
}, | ||
}, | ||
RestartPolicy: corev1.RestartPolicyNever, | ||
}, | ||
} | ||
}), | ||
Entry("with non-restartable pods with guaranteed pod with partially terminated app containers", context.TODO(), func(podNamespace string, idx int) *corev1.Pod { | ||
return &corev1.Pod{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Namespace: podNamespace, | ||
GenerateName: "generic-pause-pod-", | ||
Labels: map[string]string{ | ||
"test-pod-index": strconv.FormatInt(int64(idx), 10), | ||
}, | ||
}, | ||
Spec: corev1.PodSpec{ | ||
Containers: []corev1.Container{ | ||
{ | ||
Name: "generic-main-cnt-run-1", | ||
Image: images.GetPauseImage(), | ||
Command: []string{"/bin/sleep"}, | ||
Args: []string{"42d"}, // "forever" in our test timescale | ||
Resources: corev1.ResourceRequirements{ | ||
Limits: corev1.ResourceList{ | ||
corev1.ResourceCPU: resource.MustParse("100m"), | ||
corev1.ResourceMemory: resource.MustParse("256Mi"), | ||
}, | ||
}, | ||
}, | ||
{ | ||
Name: "generic-main-cnt-run-2", | ||
Image: images.GetPauseImage(), | ||
Command: []string{"/bin/sleep"}, | ||
Args: []string{"1s"}, | ||
Resources: corev1.ResourceRequirements{ | ||
Limits: corev1.ResourceList{ | ||
corev1.ResourceCPU: resource.MustParse("100m"), | ||
corev1.ResourceMemory: resource.MustParse("256Mi"), | ||
}, | ||
}, | ||
}, | ||
}, | ||
RestartPolicy: corev1.RestartPolicyNever, | ||
}, | ||
} | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two entries are identical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I need to rebase anyway so I'll fix with the next upload
add more test coverage for scheduler stalls, using targeted naked (= not managed by a controller) pods.
the key trigger for scheduler stall turned out to be guaranteed pods with
restartPolicy
!=Always