-
Notifications
You must be signed in to change notification settings - Fork 194
Closed
Labels
bug 🐞Something isn't workingSomething isn't working
Description
Environment
Device and OS: Rocky 8 EC2
App version: 0.29.2
Kubernetes distro being used: RKE2 v1.26.9+rke2r1
Other: Bigbang v2.11.1
Steps to reproduce
zarf package deploy zarf-package-mvp-cluster-amd64-v5.0.0-alpha.7.tar.zst --confirm -l=debug
- About 80% of the time or so, the above command gets stuck at
crane.Push()
. A retry usually works.
Expected result
That the zarf package deploy...
command wouldn't get hung up, and continue along.
Actual Result
The zarf package deploy...
command gets hung up
Visual Proof (screenshots, videos, text, etc)
��[30;100m�[30;100m DEBUG �[0m�[0m �[90m�[90m2023-10-23T18:37:19Z - Pushing ...1.dso.mil/ironbank/neuvector/neuvector/manager:5.1.3�[0m�[0m
�[30;100m�[30;100m DEBUG �[0m�[0m �[90m�[90m2023-10-23T18:37:19Z - crane.Push() /tmp/zarf-3272389118/images:registry1.dso.mil/ironbank/neuvector/neuvector/manager:5.1.3 -> 127.0.0.1:39357/ironbank/neuvector/neuvector/manager:5.1.3-zarf-487612511)�[0m�[0m
section_end:1698087620:step_script
�[0K�[31;1mERROR: Job failed: execution took longer than 35m0s seconds
Severity/Priority
There is a workaround, by keeping retrying until the process succeeds.
Additional Context
This looks exactly like #1568, which was closed.
We have a multi-node cluster on AWS EC2, our package size is about 2.9G. Here are a few things that we noticed after some extensive testing:
- this issue is not seen on a single EC2 node RKE2 cluster, it seems to only occur on multi-node clusters.
- our zarf docker registry is backed by S3. The issue is always seen in this case, but only if a multi-node cluster.
- if we back the registry with the default PVC (instead of S3), the issue is not seen at all. Since data transfer to S3 is slower than to the EBS backed PVC, maybe this extra time causes the problem to appear?
- disabling or enabling the zarf docker registry HPA doesn't seem to matter either ways.
RyanTepera1, Racer159, joelmccoy and a1994sc
Metadata
Metadata
Assignees
Labels
bug 🐞Something isn't workingSomething isn't working
Type
Projects
Status
Done