Skip to content

Conversation

@lexming
Copy link

@lexming lexming commented Sep 29, 2020

Suggested changes for PR easybuilders#11295

Copy link

@bartoldeman bartoldeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end I believe that the osdependencies should just be comments about runtime, and GDRCopy should only build libraries and binaries, not a kernel module.

Copy link

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GDRCopy now builds fine on a system that doesn't have the mentioned kernel modules 👍

@lexming
Copy link
Author

lexming commented Sep 30, 2020

I run some benchmarks with OSU-mb and the improvements are quite massive in our systems. Host to device bandwidth increases from 670 MB/s to 3800 MB/s with GDRCopy

  • with UCX_TLS=rc,cuda_copy,cuda_ipc
$ mpirun -np 2 --npernode 1 --mca pml ucx -x UCX_TLS=rc,cuda_copy,cuda_ipc osu_bw H D   
# OSU MPI-CUDA Bandwidth Test v5.6.3
# Send Buffer on HOST (H) and Receive Buffer on DEVICE (D)
# Size      Bandwidth (MB/s)
1                       0.12
2                       0.24
4                       0.47
8                       0.95
16                      1.88
32                      3.77
64                      7.64
128                    15.20
256                    30.19
512                    59.87
1024                  117.64
2048                  218.39
4096                  396.56
8192                  666.92
16384                 666.74
32768                 670.61
65536                 675.14
131072                675.59
262144                678.79
524288                677.74
1048576               678.54
2097152               679.57
4194304              3914.08
    • with UCX_TLS=rc,cuda_copy,gdr_copy,cuda_ipc
$ mpirun -np 2 --npernode 1 --mca pml ucx -x UCX_TLS=rc,cuda_copy,gdr_copy,cuda_ipc osu_bw H D
# OSU MPI-CUDA Bandwidth Test v5.6.3
# Send Buffer on HOST (H) and Receive Buffer on DEVICE (D)
# Size      Bandwidth (MB/s)
1                       2.16
2                       4.63
4                       9.24
8                      18.23
16                     37.00
32                     74.02
64                    144.21
128                   281.29
256                   553.87
512                  1041.00
1024                 1691.47
2048                 2614.88
4096                 3485.37
8192                 3602.36
16384                3776.20
32768                3832.10
65536                3859.69
131072               3877.29
262144               3885.45
524288               3889.09
1048576              3891.69
2097152              3892.50
4194304              3914.18

There is an ugly side though and that's the compatibility with the GDRCopy kernel module (gdrdrv.ko). I tested a GPU node with v1.3 of the module installed and it was completely ignored by UCX, not reporting gdr_copy as an available transport at all. So GDRCopy v1 is not compatible with v2.

I have checked the compatibility between the kernel module in v2.0 and the library from v2.1 (this EC) and fortunately they are compatible. So, the current situation is not terrible and it seems that breakage is limited to major version changes. I updated the comment about the kernel modules with this requirement.

There is also the option to have GDRCopy installed in the system, guaranteeing the compatibility with its kernel module. However, in such a case there is a similar issue at the level of the CUDA library, as GDRCopy needs CUDA and its library in the system will be built against some specific version of it. What do you think will cause less problems, having the library of GDRCopy in EB or in the host system?

@bartoldeman
Copy link

I dug a little more and saw that only the executables of GDRCopy need CUDA, make config lib does not need CUDA. You can see that too with ldd libgdrapi.so, there is no cuda lib in there, it's standalone.

And the executables (copybw and co) are not needed for UCX I think.

On the other hand CUDA already requires new-ish CUDA driver modules anyway so also requiring new-ish kernel modules for gdrcopy doesn't seem an odd requirement.

@bartoldeman bartoldeman merged commit 243ee89 into ComputeCanada:cuda-11.0.2-suffix Sep 30, 2020
@lexming lexming deleted the cuda-gdrcopy branch September 30, 2020 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants