Skip to content

Stack's 'ubuntu-latest' CI takes 3x longer than 'macos-13' CI #6720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mpilgrem opened this issue Apr 20, 2025 · 10 comments
Closed

Stack's 'ubuntu-latest' CI takes 3x longer than 'macos-13' CI #6720

mpilgrem opened this issue Apr 20, 2025 · 10 comments

Comments

@mpilgrem
Copy link
Member

mpilgrem commented Apr 20, 2025

Currently, GitHub workflow integration-tests.yml has very different performance on one runner compared to the others:

  • 'ubuntu-latest' (in an Alpine Linux Docker container) 120 mins
  • 'macos-13' 42 mins
  • 'windows-latest' 39 mins (skips some tests)

and, on AArch64:

  • `macos-latest' 23 mins

'ubuntu-latest' v 'macos-13' has some stark contrasts, the worst example being:

  • 4783-doctest-deps 548 seconds v 39 seconds

Could something be going wrong with caching?

@mpilgrem
Copy link
Member Author

Stack's online documentation for its Docker integration states:

With Docker enabled, most stack sub-commands will automatically launch themselves in an ephemeral Docker container (the container is deleted as soon as the command completes). The project directory and ~/.stack are volume-mounted into the container, so any build artifacts are "permanent" (not deleted with the container).

and

Note: ~/.stack is separately volume-mounted, and is left alone during reset [stack docker reset].

As snapshots are in the Stack root, I've always assumed that caching the Stack root is sufficient.

@mpilgrem
Copy link
Member Author

After adding log dumping on success too:

case ec of
  ExitSuccess -> do
    logInfo "Success! Dumping log\n\n"
    withSourceFile logfp $ \src ->
      runConduit $ src .| stderrC
    logInfo $ "\n\nEnd of log for " <> fromString name
  _ -> do
    logError "Failure, dumping log\n\n"
    withSourceFile logfp $ \src ->
      runConduit $ src .| stderrC
    logError $ "\n\nEnd of log for " <> fromString name

it seems that 'macos-13' (40 s) and 'ubuntu-latest' (588 s) are reporting exactly the same steps for integration test 4783, albeit in a different order.

@mpilgrem
Copy link
Member Author

mpilgrem commented Apr 21, 2025

It seems to be something to do with linking. There are four places in integration test 4783 where linking occurs: (a) Stack's Setup shim for local acme-dont-copy, (b) ghc-paths-0.1.0.12's setup, (c) doctest, and (d) foo's doctest test suite.

  • 'macos-13' (41 s): (a) 1.1 s (b) 0.8 a (c) 3.8 s (d) 1.5 s
  • 'ubuntu-latest' (in Alpine Linux Docker container) (596 s): (a) 52.4 s (b) 0.4 s (c) 240.9 s (d) 235.5 s

The linking delays at (a), (c) and (d) are accounting for almost all of the additional duration of 'ubuntu-latest'.

@mpilgrem
Copy link
Member Author

mpilgrem commented Apr 21, 2025

@benz0li, as described above, linking during integration tests seems to be much slower (4 mins v 4 secs) in the Linux/x86_64 Docker container than on macOS/x86_64. Would you happen to know why?

I queried ChatGPT-4o and it offered up this summary:

Cause Impact Fix
ld.bfd is slow 🔥🔥🔥 Huge Use lld or gold
Static linking w/ musl 🔥🔥 Big Use dynamic if possible
Docker file I/O 🔥 Medium Mount to tmpfs or build outside
CPU limits in Docker 🔥 Medium Use more cores

@benz0li
Copy link
Contributor

benz0li commented Apr 21, 2025

IMHO ld.bfd is the safe choice.

@benz0li
Copy link
Contributor

benz0li commented Apr 21, 2025

None of the final image(s) have binutils-gold installed

https://gitlab.haskell.org/ghc/ghc/-/issues/25093#note_579337

@benz0li
Copy link
Contributor

benz0li commented Apr 21, 2025

If you want to use ld.gold, install binutils-gold (i.e. command apk add --no-cache binutils-gold) before building Stack and link with the GNU gold linker instead of ld.

@mpilgrem
Copy link
Member Author

@benz0li, thanks!

For future reference:

Runner which ld ld -v Comment
ubuntu-latest (in Alpine Linux Docker container) /usr/bin/ld GNU ld (GNU Binutils) 2.43.1 ld.bfd
macos-13 /usr/bin/ld See (a) below Apple's linker
windows-latest See (b) below LLD 14.0.6 (compatible with GNU linkers) LLVM's linker

(a):

@(#)PROGRAM:ld  PROJECT:dyld-1022.1
BUILD 05:26:33 Dec  7 2023
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h
will use ld-classic for: armv6 armv7 armv7s arm64_32 i386 armv6m armv7k armv7m armv7em
LTO support using: LLVM version 15.0.0 (static support for 29, runtime is 29)
TAPI support using: Apple TAPI version 15.0.0 (tapi-1500.0.12.8)
Library search paths:
Framework search paths:

(b):

/c/Users/runneradmin/AppData/Local/Programs/stack/x86_64-windows/ghc-9.8.4/mingw/bin/ld

@mpilgrem
Copy link
Member Author

This was fixed by:

  • installing lld when using ubuntu-latest;
  • causing the stack-integration-test executable to use (and expect) the lld linker on Linux; and
  • changing release.hs check to execute stack-integration-test with the target Stack but outside of the Alpine Linux Docker container.

The results were dramatic:

Runner integration-tests.yml Before After integration tests Before After
ubuntu-latest 120 min 26 min 6,269 s 683 s
macos-13 42 min 43 min 1,733 s 1,722 s
macos-latest 23 min 23 min 964 s 959 s
windows-latest 39 min 39 min 1,461 s 1,502 s

mpilgrem added a commit that referenced this issue Apr 25, 2025
Also makes other minor changes for consistency of terminology.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants