The cabal-benchmarks package uses test-suite stanzas for its performance benchmarks. This means cabal test all from the project root picks them up and runs them as regular tests. Since they are timing-sensitive, they
produce flaky failures depending on machine load, giving contributors spurious test failures unrelated to their changes.
To Reproduce
$ cabal test all
Observe that cabal-benchmarks runs and may report failures like 2 out of 22 tests failed due to timing variance.
Expected behavior
cabal test all should only run actual tests, not performance benchmarks. Benchmarks should either use benchmark stanzas so they are invoked via cabal bench, or be excluded from the default test target.
System information
- Linux x86_64
- cabal 3.17.0.0 (HEAD)
- ghc 9.10.3
Additional context
Possible fixes:
- Use benchmark stanzas instead of test-suite stanzas in cabal-benchmarks
- Exclude cabal-benchmarks from the default cabal test all target
- Gate the benchmarks behind a cabal flag that's off by default
The cabal-benchmarks package uses test-suite stanzas for its performance benchmarks. This means cabal test all from the project root picks them up and runs them as regular tests. Since they are timing-sensitive, they
produce flaky failures depending on machine load, giving contributors spurious test failures unrelated to their changes.
To Reproduce
$ cabal test all
Observe that cabal-benchmarks runs and may report failures like 2 out of 22 tests failed due to timing variance.
Expected behavior
cabal test all should only run actual tests, not performance benchmarks. Benchmarks should either use benchmark stanzas so they are invoked via cabal bench, or be excluded from the default test target.
System information
Additional context
Possible fixes: