Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
4.1.1
Master nightly tarball from Nov. 4th
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
from release tarballs; only the --prefix=...
configure parameter was used
Please describe the system on which you are running
- Operating system/version: based on RHEL 7
- Computer hardware: Intel
- Network type: Mellanox (ConnectX-5) with UCX 1.9.0
Details of the problem
I normally disable the tcp btl in order to have jobs fail with an obvious error if there is a problem with the high speed interconnect's communication library. This way I know when optimal performance isn't happening. I have found, however, that disabling the tcp btl causes the rdma osc to cause problems when trying to run IMB-RMA
(Intel MPI Benchmark's RMA). When that is done, a message like this occurs:
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[9281,1],4]) is on host: ko002
Process 2 ([[9281,1],0]) is on host: ko001
BTLs attempted: self sm
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
This happens in both 4.1.1 and the nightly tarball from Nov 4th for master. If I enable the tcp btl, or disable the rdma osc, IMB-RMA
runs just fine. I can also force the use of the ucx osc, and that seems to get IMB-RMA
to work as well.
I am currently running IMB-RMA
like the following:
# Fails
$> mpirun --map-by ppr:1:node --mca btl '^tcp' ./IMB-RMA
# Works
$> mpirun --map-by ppr:1:node --mca btl '^tcp' --mca osc '^rdma' ./IMB-RMA
$> mpirun --map-by ppr:1:node --mca btl '^tcp' --mca osc ucx ./IMB-RMA
In all cases, the ucx osc component is chosen, at least according to the output when I use --mca osc_base_verbose 100
. So, it seems that the initialization of the rdma osc component has issues when the tcp btl is missing. Is this expected?
Thanks,
David