fix: scope test temp dirs to RUNNER_TEMP instead of globbing shared /tmp#5239
Merged
Conversation
8db20df to
4b1e2bc
Compare
The build host runs ~28 GHA runners as a single user against one tmpfs /tmp, so any job running `rm -rf /tmp/e2e-* /tmp/test-cluster*` on `if: always()` wipes sibling runners' live VolatileDB files. Under UTxO-HD (node >= 10.7.0) the consensus layer re-opens these files by path and crashes with `FsResourceDoesNotExist` instead of tolerating the unlink via open fd's, which manifested as the Conway Integration Tests failure on master at 253d290. Point TMPDIR at \$RUNNER_TEMP — per-job, per-runner, auto-cleaned by the runner service — so test clusters live in a private directory that sibling runners cannot touch. Drop the now-redundant (and dangerous) `rm -rf /tmp/...` cleanup steps from linux-e2e and release. Replace the shared `/tmp/gha-bench` fixed path in benchmarks with RUNNER_TEMP for the same reason.
4b1e2bc to
1b4ffbd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The CI build host (
zur1-s-d-029) runs ~28 GHA self-hosted runnersas a single user against one shared tmpfs
/tmp. Any job runningrm -rf /tmp/e2e-* /tmp/test-cluster*onif: always()will wipesibling runners' live VolatileDB files.
Under UTxO-HD (cardano-node >= 10.7.0) the consensus layer re-opens
VolatileDB files by path through
fs-api, so an unlinkedblocks-*.datnow crashes the node withApiMisuse (ClosedDBError (UnexpectedFailure (FileSystemError FsResourceDoesNotExist …)))instead of tolerating the unlink via open fd's.
This reproduced on master (
253d290bfd) — Conway Integration Testscrashed at 15:46:43 UTC on 2026-04-20 with two pool nodes failing
simultaneously on
/tmp/test-cluster436150/pool-*/db/volatile/blocks-0.dat.See also upstream ouroboros-consensus#1991.
Fix
TMPDIR: ${{ runner.temp }}on every job that launches a localtest cluster or E2E run.
$RUNNER_TEMPis per-job, per-runner, andauto-cleaned by the runner service between jobs, so clusters live
in a private directory that sibling runners cannot touch.
.github/workflows/linux-e2e.ymland.github/workflows/release.yml—
$RUNNER_TEMPmakes them redundant.TMPDIR: /tmp/gha-benchinlinux-benchmarks.ymlandrestoration-benchmarks.ymlfor thesame reason.
No new cleanup steps are introduced.