-
Notifications
You must be signed in to change notification settings - Fork 12
Add GDRCopy to UCX with CUDA #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GDRCopy to UCX with CUDA #15
Conversation
easybuild/easyconfigs/g/GDRCopy/GDRCopy-2.1-GCCcore-9.3.0-CUDA-11.0.2.eb
Outdated
Show resolved
Hide resolved
easybuild/easyconfigs/g/GDRCopy/GDRCopy-2.1-GCCcore-9.3.0-CUDA-11.0.2.eb
Show resolved
Hide resolved
bartoldeman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end I believe that the osdependencies should just be comments about runtime, and GDRCopy should only build libraries and binaries, not a kernel module.
easybuild/easyconfigs/g/GDRCopy/GDRCopy-2.1-GCCcore-9.3.0-CUDA-11.0.2.eb
Show resolved
Hide resolved
easybuild/easyconfigs/g/GDRCopy/GDRCopy-2.1-GCCcore-9.3.0-CUDA-11.0.2.eb
Outdated
Show resolved
Hide resolved
boegel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GDRCopy now builds fine on a system that doesn't have the mentioned kernel modules 👍
easybuild/easyconfigs/g/GDRCopy/GDRCopy-2.1-GCCcore-9.3.0-CUDA-11.0.2.eb
Outdated
Show resolved
Hide resolved
|
I run some benchmarks with OSU-mb and the improvements are quite massive in our systems. Host to device bandwidth increases from 670 MB/s to 3800 MB/s with GDRCopy
There is an ugly side though and that's the compatibility with the GDRCopy kernel module ( I have checked the compatibility between the kernel module in v2.0 and the library from v2.1 (this EC) and fortunately they are compatible. So, the current situation is not terrible and it seems that breakage is limited to major version changes. I updated the comment about the kernel modules with this requirement. There is also the option to have GDRCopy installed in the system, guaranteeing the compatibility with its kernel module. However, in such a case there is a similar issue at the level of the CUDA library, as GDRCopy needs CUDA and its library in the system will be built against some specific version of it. What do you think will cause less problems, having the library of GDRCopy in EB or in the host system? |
|
I dug a little more and saw that only the executables of GDRCopy need CUDA, And the executables (copybw and co) are not needed for UCX I think. On the other hand CUDA already requires new-ish CUDA driver modules anyway so also requiring new-ish kernel modules for gdrcopy doesn't seem an odd requirement. |
Suggested changes for PR easybuilders#11295