Performance issue when running OpenACC code #3681
laytonjbgmail
started this conversation in
General
Replies: 1 comment
-
I need to amend the original post. If I start the singularity container and then run the code "by hand" (bypassing slurm), then I get the proper performance. So it appears it's an interaction with Slurm and singluarityCE? Thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm having issues when I run an OpenACC code inside a SingularityCE container compared to bare metal or Docker container.
When I run on baremetal or in a Docker container, going from the CPU only version (no OpenACC) to the OpenACC, I get a speedup of a little over 100x. However, when I run the same two versions (one is pure CPU, no OpenACC, and the second is the OpenACC version) inside of a SingularityCE container, the speedup is only about 7x.
Here are some details:
nvcr.io/nvidia/nvhpc:25.1-devel-cuda_multi-ubuntu24.04
docker run --gpus device=0 --rm mpirun -np 2 -H localhost:2 --allow-run-as-root --map-by slot -mca coll_hcoll_enable 0 ./himeno-acc.exe > file.output
singularity exec --nv --env NVIDIA_VISIBLE_DEVICES-all mpirun -np 2 -H localhost:2 --allow-run-as-root --map-by slot -mca coll_hcoll_enable 0 ./himeno-acc.exe > file.output
sbatch -W node_name --ntasks-per-node=2 --nodes=1 ... (docker run or singularity exec)
I've done some Googling askd asked a few people but so far, no suggestion has changed the performance.
If anyone has any ideas or pointers, I would really appreciate it. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions