Skip to content

segmentation fault with numpy on POWER9 (only) when using FlexiBLAS #17

@boegel

Description

@boegel

I'm seeing a Segmentation fault when running the numpy 1.20.3 tests when using FlexiBLAS 3.0.4 with OpenBLAS 0.3.15, but not when linking to OpenBLAS 0.3.15 directly, which tells me FlexiBLAS is somehow causing the segmentation fault...

I'm not seeing this problem on Intel (Haswell, Skylake X), AMD (Rome), or Arm (AWS Graviton2).

Here's a partial backtrace I obtained when running the numpy tests via gdb:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff4887530 in dnrm2_k () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
Missing separate debuginfos, use: yum debuginfo-install libxcrypt-4.1.1-4.el8.ppc64le
(gdb) bt
#0  0x00007ffff4887530 in dnrm2_k () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
#1  0x00007ffff453d788 in dnrm2_ () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
#2  0x00007ffff62cfd9c in dnrm2_ () from /home/centos/EasyBuild/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib64/libflexiblas.so.3
#3  0x00007ffff4d7816c in dgeev_ () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
#4  0x00007ffff639e8e4 in dgeev_ () from /home/centos/EasyBuild/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib64/libflexiblas.so.3
#5  0x00007fff7364b334 in call_dgeev (params=0x7ffffffe63b0) at numpy/linalg/umath_linalg.c.src:2292
#6  DOUBLE_eig_wrapper (JOBVL=JOBVL@entry=78 'N', JOBVR=JOBVR@entry=86 'V', args=0x7fff50dad120, dimensions=<optimized out>, steps=<optimized out>) at numpy/linalg/umath_linalg.c.src:2292
#7  0x00007fff7364c02c in DOUBLE_eig (args=<optimized out>, dimensions=<optimized out>, steps=<optimized out>, __NPY_UNUSED_TAGGEDfunc=<optimized out>) at numpy/linalg/umath_linalg.c.src:2336
#8  0x00007ffff6a5d294 in PyUFunc_GeneralizedFunction (op=0x7ffffffe8200, kwds=0x0, args=0x7fff50dad0f0, ufunc=0x0) at numpy/core/src/umath/ufunc_object.c:2986
#9  PyUFunc_GenericFunction_int (ufunc=<optimized out>, ufunc@entry=0x7fff736c1130, args=args@entry=0x7fff50f88820, kwds=kwds@entry=0x7fff50e79c00, op=op@entry=0x7ffffffe8200)
    at numpy/core/src/umath/ufunc_object.c:3119
#10 0x00007ffff6a5f740 in ufunc_generic_call (ufunc=0x7fff736c1130, args=0x7fff50f88820, kwds=0x7fff50e79c00) at numpy/core/src/umath/ufunc_object.c:4747
...

This only happens when numpy is linked with FlexiBLAS:

$ ldd $(python -c "import numpy; print(numpy.core._multiarray_umath.__file__)") | grep blas
	libflexiblas.so.3 => /home/centos/EasyBuild/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib64/libflexiblas.so.3 (0x0000200000570000)

Any ideas on what may be causing this segmentation fault?

I tried using ulimit -s unlimited (default is 8192 on that system), no change.

After export FLEXIBLAS=netlib to make FlexiBLAS use the fallback netlib backend, the segmentation fault doesn't happen either...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions