Skip to content

Multiversioning is order dependent? #50148

Closed
@simonbyrne

Description

@simonbyrne

I'm on a HPC system with a few different architectures:

Login node is skylake

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 32 virtual cores
Environment:
  LD_LIBRARY_PATH = /central/software/julia/1.9.0/lib:/central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib
  LD_RUN_PATH = /central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib

and a broadwell compute node

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 28 × Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, broadwell)
  Threads: 1 on 28 virtual cores
Environment:
  LD_LIBRARY_PATH = /central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib:/central/software/julia/1.9.0/lib:/central/slurm/install/current/lib/
  LD_RUN_PATH = /central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib

I'm calling Pkg.precompile() on the login node, then using CUDA on the compute node.

1. the default

If I don't set anything, then loading CUDA on the compute node will trigger precompilation again. Setting JULIA_DEBUG=all, I get the following warning

┌ Debug: Rejecting cache file /home/spjbyrne/.julia/compiled/v1.9/CUDA/oWw5k_OHRW8.ji for CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2706
┌ Debug: Precompiling CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]
└ @ Base loading.jl:2140

(and similar for CUDA.jl's deps)

2. setting JULIA_CPU_TARGET=broadwell

Since broadwell is supported by both nodes, this seems to work as intended. I do get the following warning:

┌ Debug: Rejecting cache file /central/software/julia/1.9.0/share/julia/compiled/v1.9/Statistics/ERcPL_Stp2R.ji for Statistics [10745b16-79ce-11e8-11f9-7d13ad32a3b2] since the flags are mismatched
│   current session: use_pkgimages = true, debug_level = 1, check_bounds = 0, inline = true, opt_level = 2
│   cache file:      use_pkgimages = true, debug_level = 1, check_bounds = 1, inline = true, opt_level = 2
└ @ Base loading.jl:2690

but it doesn't appear to cause any issues (perhaps since Statistics isn't built as a pkgimage?).

3. setting JULIA_CPU_TARGET='skylake;broadwell'

This does not appear to work, and gives the same behavior as 1:

┌ Debug: Rejecting cache file /home/spjbyrne/.julia/compiled/v1.9/CUDA/oWw5k_Qcjfa.ji for CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2706
┌ Debug: Precompiling CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]
└ @ Base loading.jl:2140

(and similar for dependencies)

cc @vchuravy

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions