Skip to content

MPI inter-node issues with Intel MPI v2019 on Mellanox IB #10314

@lexming

Description

@lexming

I tested the execution of a simple inter-node job between two nodes over our Infiniband network with updates 5, 6 and 7 of Intel MPI v2019 and I found very different results for each release. All tests were carried out with iccifort/2020.1.217 as base of the toolchain.

Characteristics of the testing system

  • CPU: 2x Intel(R) Xeon(R) Gold 6126
  • Adapter: Mellanox Technologies MT27700 Family [ConnectX-4]
  • Operative System: Cent OS 7.7
  • Related system libraries: UCX v1.5.1, OFED v4.7-3.2.9
  • ICC: v2020.1 (from Easybuild)
  • Resource manager: Torque

Steps to reproduce:

  1. Start a job on two nodes
  2. Load impi
  3. mpicc ${EBROOTIMPI}/test/test.c -o test
  4. mpirun ./test

Intel MPI v2019 update 5: works out of the box

$ module load impi/2019.5.281-iccifort-2020.1.217
$ fi_info --version
fi_info: 1.7.2a
libfabric: 1.7.2a
libfabric api: 1.7
$ fi_info | grep provider
provider: verbs;ofi_rxm
provider: verbs;ofi_rxd
provider: verbs
provider: verbs
provider: verbs
$ mpirun ./test
Hello world: rank 0 of 2 running on node357.hydra.os
Hello world: rank 1 of 2 running on node356.hydra.os

Intel MPI v2019 update 6: does NOT work out of the box, but can be fixed

$ module load impi/2019.6.166-iccifort-2020.1.217
$ fi_info --version
fi_info: 1.9.0a1
libfabric: 1.9.0a1-impi
libfabric api: 1.8
$ fi_info | grep provider
provider: mlx
provider: mlx;ofi_rxm
$ mpirun ./test
[1585832682.960816] [node357:302190:0]         select.c:406  UCX  ERROR no active messages transport to <no debug data>: self/self - Destination is unreachable, rdmacm/sockaddr - no am bcopy, mm/sysv - Destination is unreachable, mm/posix - Destination is unreachable, cma/cma - no am bcopy
Abort(1091471) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(703)........: 
MPID_Init(958)...............: 
MPIDI_OFI_mpi_init_hook(1382): OFI get address vector map failed
  • Solution 1: use verbs or tcp libfabric providers instead of mlx
$ module load impi/2019.6.166-iccifort-2020.1.217
$ FI_PROVIDER=verbs,tcp mpirun ./test
Hello world: rank 0 of 2 running on node357.hydra.os
Hello world: rank 1 of 2 running on node356.hydra.os
$ module load impi/2019.6.166-iccifort-2020.1.217
$ module load UCX/1.7.0-GCCcore-9.3.0
$ ucx_info
# UCT version=1.7.0 revision 
# configured with: --prefix=/user/brussel/101/vsc10122/.local/easybuild/software/UCX/1.7.0-GCCcore-9.3.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-optimizations --enable-cma --enable-mt --with-verbs --without-java --disable-doxygen-doc
$ FI_PROVIDER=mlx mpirun ./test
Hello world: rank 0 of 2 running on node357.hydra.os
Hello world: rank 1 of 2 running on node356.hydra.os
  • Solution 3: use external libfabric v1.9.1. Upstream libfabric dropped mlx with version 1.9.0
$ module load impi/2019.6.166-iccifort-2020.1.217
$ module load libfabric/1.9.1-GCCcore-9.3.0
$ export FI_PROVIDER_PATH=
$ fi_info --version
fi_info: 1.9.1
libfabric: 1.9.1
libfabric api: 1.9
$ mpirun ./test
Hello world: rank 0 of 2 running on node357.hydra.os
Hello world: rank 1 of 2 running on node356.hydra.os

Intel MPI v2019 update 7: does NOT work at all

$ module load impi/2019.7.217-iccifort-2020.1.217
$ fi_info --version
fi_info: 1.10.0a1
libfabric: 1.10.0a1-impi
libfabric api: 1.9
$ fi_info | grep provider
provider: verbs;ofi_rxm
[...]
provider: tcp;ofi_rxm
[...]
provider: verbs
[...]
provider: tcp
[...]
provider: sockets
[...]
$ $ I_MPI_DEBUG=4 I_MPI_HYDRA_DEBUG=on FI_LOG_LEVEL=debug mpirun ./test
[[email protected]] Launch arguments: /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin//hydra_bstrap_proxy --upstream-host node357.hydra.brussel.vsc --upstream-port 40969 --pgid 0 --launcher ssh --launcher-number 0 --base-path /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 
[[email protected]] Launch arguments: /usr/bin/ssh -q -x node356.hydra.brussel.vsc /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin//hydra_bstrap_proxy --upstream-host node357.hydra.brussel.vsc --upstream-port 40969 --pgid 0 --launcher ssh --launcher-number 0 --base-path /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 1 --node-id 1 --subtree-size 1 /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 
[proxy:0:[email protected]] Warning - oversubscription detected: 1 processes will be placed on 0 cores
[proxy:0:[email protected]] pmi cmd from fd 4: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:[email protected]] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:[email protected]] pmi cmd from fd 4: cmd=get_maxes
[proxy:0:[email protected]] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:[email protected]] pmi cmd from fd 4: cmd=get_appnum
[proxy:0:[email protected]] PMI response: cmd=appnum appnum=0
[proxy:0:[email protected]] pmi cmd from fd 4: cmd=get_my_kvsname
[proxy:0:[email protected]] PMI response: cmd=my_kvsname kvsname=kvs_309778_0
[proxy:0:[email protected]] pmi cmd from fd 4: cmd=get kvsname=kvs_309778_0 key=PMI_process_mapping
[proxy:0:[email protected]] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:[email protected]] pmi cmd from fd 4: cmd=barrier_in

(the execution does not stop, it just hangs at this point)

The system log of the node shows the following entry

traps: hydra_pmi_proxy[549] trap divide error ip:4436ed sp:7ffed012ef50 error:0 in hydra_pmi_proxy[400000+ab000]

This error with IMPI v2019.7 happens way before initializing libfabric. Therefore, it does not depend on the provider or the version of UCX. It happens all the time.

Update

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions