Skip to content

IMB tesuite fails when using vader #4260

Closed
@nmorey

Description

@nmorey

Using openmpi 2.1.2 and the Intel MPI Benchmark suite (https://software.intel.com/sites/default/files/managed/76/6c/IMB_2017_Update2.tgz) on x86 systems (multiple SUSE versions)

I get this error

mpirun -np 2  --mca btl vader,self /usr/lib/mpi/gcc/openmpi2/tests/IMB/IMB-MPI1
[snip...]
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv 
# #processes = 2 
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         0.60         0.60         0.60         0.00
            1         1000         0.59         0.59         0.59         3.38
            2         1000         0.55         0.55         0.55         7.31
            4         1000         0.62         0.62         0.62        12.81
            8         1000         0.64         0.64         0.64        24.94
           16         1000         0.76         0.76         0.76        41.94
           32         1000         0.56         0.56         0.56       114.88
           64         1000         0.64         0.64         0.64       200.60
          128         1000         0.65         0.65         0.65       396.01
          256         1000         1.10         1.10         1.10       463.57
          512         1000         1.49         1.50         1.49       684.71
         1024         1000         1.82         1.82         1.82      1122.39
         2048         1000         2.07         2.07         2.07      1979.64
         4096         1000         2.63         2.63         2.63      3113.39
         8192         1000         2.74         2.74         2.74      5986.21
        16384         1000         4.42         4.42         4.42      7410.65
[portia:25305] *** Process received signal ***
[portia:25305] Signal: Segmentation fault (11)
[portia:25305] Signal code: Address not mapped (1)
[portia:25305] Failing at address: 0x56dc0730
[portia:25305] [ 0] linux-gate.so.1(__kernel_rt_sigreturn+0x0)[0xf77bdf70]
[portia:25305] [ 1] /usr/lib/mpi/gcc/openmpi2/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x133)[0xf5882e33]
[portia:25305] [ 2] /usr/lib/mpi/gcc/openmpi2/lib/openmpi/mca_btl_vader.so(+0x4251)[0xf5883251]
[portia:25305] [ 3] /usr/lib/mpi/gcc/openmpi2/lib/libopen-pal.so.20(opal_progress+0x70)[0xf7377720]
[portia:25305] [ 4] /usr/lib/mpi/gcc/openmpi2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x925)[0xf54571b5]
[portia:25305] [ 5] /usr/lib/mpi/gcc/openmpi2/lib/libmpi.so.20(MPI_Sendrecv+0x299)[0xf77279e9]
[portia:25305] [ 6] /usr/lib/mpi/gcc/openmpi2/tests/IMB/IMB-MPI1(+0xbee9)[0x5663bee9]
[portia:25305] [ 7] /usr/lib/mpi/gcc/openmpi2/tests/IMB/IMB-MPI1(+0x65c8)[0x566365c8]
[portia:25305] [ 8] /usr/lib/mpi/gcc/openmpi2/tests/IMB/IMB-MPI1(+0x1f02)[0x56631f02]
[portia:25305] [ 9] /lib/libc.so.6(__libc_start_main+0xf3)[0xf7511743]
[portia:25305] [10] /usr/lib/mpi/gcc/openmpi2/tests/IMB/IMB-MPI1(+0x1971)[0x56631971]
[portia:25305] *** End of error message ***

while mpirun -np 2 --mca btl sm,self /usr/lib/mpi/gcc/openmpi2/tests/IMB/IMB-MPI1 works fine

Tried to gdb the SEGV but no success yet.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions