Skip to content

Fix/ci periodic#530

Merged
droot merged 2 commits into
GoogleCloudPlatform:mainfrom
zvdy:fix/ci-periodic
Sep 11, 2025
Merged

Fix/ci periodic#530
droot merged 2 commits into
GoogleCloudPlatform:mainfrom
zvdy:fix/ci-periodic

Conversation

@zvdy

@zvdy zvdy commented Sep 11, 2025

Copy link
Copy Markdown
Contributor

Fix periodic CI reliability and prevent resource waste

Problem

Solves #488 and guardrails common failed CI's, some samples here:

  1. Checksum validation errors from outdated kind-action version
  2. kubectl download timeouts causing cluster setup failures
  3. Evaluation timeouts (30+ minutes) causing job failures and overlapping runs

Changes

  • Upgraded to latest stable version: helm/kind-action@v1.12.0 releases
  • Increased cluster wait timeout from 60s to up to 300s for better reliability
  • Optimized Timeouts
    • Job timeout: 12 minutes (fits within 15-minute cron schedule)
    • Cluster setup: 3 minutes timeout
  • 2 retry attempts for evaluation step
    • 4-minute timeout per attempt with process cleanup
  • 10-second wait between retries

TODO

I'm not able to test this on a GKE cluster as the setup is done to replicate, so I would kindly ask you @droot whenever you have time to clone this branch/commits and test the action a few times on workflow dispatch and see if it solves the issues. The checksum issues were due to the reference of the commit itself instead of a release in .github/actions/kind-cluster-setup/action.yaml so we should be good now :-)

@droot

droot commented Sep 11, 2025

Copy link
Copy Markdown
Member

Thanks @zvdy . Getting the PR merged because they improve our current set-up. There are some other changes that I merged today #529 that should reduce the resource requirements. So I am hoping with these, flakes should be reduced.

@droot droot merged commit 3c1538f into GoogleCloudPlatform:main Sep 11, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants