Skip to content

Comments

build: Harden flaky Aeron tests in CI#32242

Merged
patriknw merged 2 commits intomainfrom
wip-dev-shm-patriknw
Nov 28, 2023
Merged

build: Harden flaky Aeron tests in CI#32242
patriknw merged 2 commits intomainfrom
wip-dev-shm-patriknw

Conversation

@patriknw
Copy link
Contributor

  • increase /dev/shm and use that (by default)
  • use default term buffer size
  • increase cpu requests, shouldn't matter but corresponds to what we want to use, 2 pods per node

This looks very promising. I have tried in a gke cluster. Verified with df -h. It was 64 MB and now 1G.

No more "Scheduled sending of heartbeat was delayed".

This wasn't possible when we tried last time #30601

-Dakka.cluster.assert=on \
-Daeron.dir=/opt/volumes/media-driver \
-Daeron.term.buffer.length=33554432 \
clean ${{ matrix.command }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job is not in Kubernetes. Might have same problem with too small /dev/shm. Let me try...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plenty of space, no problem.

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   62G   22G  74% /
tmpfs           7.9G  172K  7.9G   1% /dev/shm
tmpfs           3.2G  1.1M  3.2G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb15      105M  6.1M   99M   6% /boot/efi
/dev/sda1        63G  4.1G   56G   7% /mnt
tmpfs           1.6G   12K  1.6G   1% /run/user/1001

gcloud config set compute/zone us-central1-c
./kubernetes/create-cluster-gke.sh "akka-artery-aeron-cluster-${GITHUB_RUN_ID}"
gcloud container clusters get-credentials akka-artery-aeron-cluster-test --zone us-central1-c --project akka-team
# ./kubernetes/create-cluster-gke.sh "akka-artery-aeron-cluster-test"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional? Not calling the script to create the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from my testing, thanks

@patriknw patriknw force-pushed the wip-dev-shm-patriknw branch from 77334b7 to a50c9e5 Compare November 27, 2023 15:13
* increase /dev/shm and use that (by default)
* use default term buffer size
* increase cpu requests, shouldn't matter but corresponds
  to what we want to use, 2 pods per node
@patriknw patriknw force-pushed the wip-dev-shm-patriknw branch from a50c9e5 to bf1d4f0 Compare November 27, 2023 15:14
Copy link
Member

@pvlugter pvlugter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* more memory request
* separate Aeron run in another workflow to make
  such test failures more clear
@patriknw
Copy link
Contributor Author

There was an error: "insuffiient usable storage for new log of ". I have increased it. I don't know if it accumulates when running all tests? It's supposed to delete the files on shutdown.

@patriknw
Copy link
Contributor Author

I separated the aeron run in separate workflow. I hope that shows up so I can trigger a manual run if I merge this?

@patriknw patriknw merged commit 95d7210 into main Nov 28, 2023
@patriknw patriknw deleted the wip-dev-shm-patriknw branch November 28, 2023 07:31
@patriknw patriknw added this to the 2.9.1 milestone Nov 28, 2023
@patriknw
Copy link
Contributor Author

He-Pin pushed a commit to He-Pin/akka that referenced this pull request Jan 7, 2024
* increase /dev/shm and use that (by default)
* use default term buffer size
* increase cpu requests, shouldn't matter but corresponds
  to what we want to use, 2 pods per node
* more memory request
* separate Aeron run in another workflow to make
  such test failures more clear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants