Skip to content

Disabling tcp btl causes issues for rdma osc when running IMB-RMA #9630

Open
@dshrader

Description

@dshrader

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

4.1.1
Master nightly tarball from Nov. 4th

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

from release tarballs; only the --prefix=... configure parameter was used

Please describe the system on which you are running

  • Operating system/version: based on RHEL 7
  • Computer hardware: Intel
  • Network type: Mellanox (ConnectX-5) with UCX 1.9.0

Details of the problem

I normally disable the tcp btl in order to have jobs fail with an obvious error if there is a problem with the high speed interconnect's communication library. This way I know when optimal performance isn't happening. I have found, however, that disabling the tcp btl causes the rdma osc to cause problems when trying to run IMB-RMA (Intel MPI Benchmark's RMA). When that is done, a message like this occurs:

--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[9281,1],4]) is on host: ko002
  Process 2 ([[9281,1],0]) is on host: ko001
  BTLs attempted: self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------

This happens in both 4.1.1 and the nightly tarball from Nov 4th for master. If I enable the tcp btl, or disable the rdma osc, IMB-RMA runs just fine. I can also force the use of the ucx osc, and that seems to get IMB-RMA to work as well.

I am currently running IMB-RMA like the following:

# Fails
$> mpirun --map-by ppr:1:node --mca btl '^tcp' ./IMB-RMA
# Works
$> mpirun --map-by ppr:1:node --mca btl '^tcp' --mca osc '^rdma' ./IMB-RMA
$> mpirun --map-by ppr:1:node --mca btl '^tcp' --mca osc ucx ./IMB-RMA

In all cases, the ucx osc component is chosen, at least according to the output when I use --mca osc_base_verbose 100. So, it seems that the initialization of the rdma osc component has issues when the tcp btl is missing. Is this expected?

Thanks,
David

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions