Closed
Description
I'm on a HPC system with a few different architectures:
Login node is skylake
julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
Threads: 1 on 32 virtual cores
Environment:
LD_LIBRARY_PATH = /central/software/julia/1.9.0/lib:/central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib
LD_RUN_PATH = /central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib
and a broadwell compute node
julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 28 × Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, broadwell)
Threads: 1 on 28 virtual cores
Environment:
LD_LIBRARY_PATH = /central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib:/central/software/julia/1.9.0/lib:/central/slurm/install/current/lib/
LD_RUN_PATH = /central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib
I'm calling Pkg.precompile()
on the login node, then using CUDA
on the compute node.
1. the default
If I don't set anything, then loading CUDA on the compute node will trigger precompilation again. Setting JULIA_DEBUG=all
, I get the following warning
┌ Debug: Rejecting cache file /home/spjbyrne/.julia/compiled/v1.9/CUDA/oWw5k_OHRW8.ji for CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2706
┌ Debug: Precompiling CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]
└ @ Base loading.jl:2140
(and similar for CUDA.jl's deps)
2. setting JULIA_CPU_TARGET=broadwell
Since broadwell
is supported by both nodes, this seems to work as intended. I do get the following warning:
┌ Debug: Rejecting cache file /central/software/julia/1.9.0/share/julia/compiled/v1.9/Statistics/ERcPL_Stp2R.ji for Statistics [10745b16-79ce-11e8-11f9-7d13ad32a3b2] since the flags are mismatched
│ current session: use_pkgimages = true, debug_level = 1, check_bounds = 0, inline = true, opt_level = 2
│ cache file: use_pkgimages = true, debug_level = 1, check_bounds = 1, inline = true, opt_level = 2
└ @ Base loading.jl:2690
but it doesn't appear to cause any issues (perhaps since Statistics isn't built as a pkgimage?).
3. setting JULIA_CPU_TARGET='skylake;broadwell'
This does not appear to work, and gives the same behavior as 1:
┌ Debug: Rejecting cache file /home/spjbyrne/.julia/compiled/v1.9/CUDA/oWw5k_Qcjfa.ji for CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2706
┌ Debug: Precompiling CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]
└ @ Base loading.jl:2140
(and similar for dependencies)
cc @vchuravy