Skip to content

Segfault in sgemm_kernel on old x86_64 #2047

Closed
@Celelibi

Description

@Celelibi

The single precision gemm kernel consistently crash on an old x86_64 CPU. But only with large enough matrices. Like 1000*1000.

Here is the simplest program I could write that shows it.

#include <stdlib.h>
#include <cblas.h>

#define SIZE 1000

int main(void) {
	float A[SIZE * SIZE];
	float C[SIZE * SIZE];
	cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, SIZE, SIZE, SIZE, 1, A, SIZE, A, SIZE, 0, C, SIZE);
	return 0;
}

Weirdly enough, it doesn't crash for a single thread.

$ gcc -o sgemm_test sgemm_test.c -lopenblas
$ ./sgemm_test
zsh: segmentation fault  ./sgemm_test
$ OPENBLAS_NUM_THREADS=1 ./sgemm_test
$ OPENBLAS_NUM_THREADS=2 ./sgemm_test
zsh: segmentation fault  OPENBLAS_NUM_THREADS=2 ./sgemm_test

And here is the cpuinfo.

$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 104
model name      : AMD Athlon(tm) 64 X2 Dual-Core Processor TK-55
stepping        : 1
cpu MHz         : 1800.000
cache size      : 256 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall lbrv
bugs            : apic_c1e fxsave_leak sysret_ss_attrs null_seg swapgs_fence amd_e400 spectre_v1 spectre_v2
bogomips        : 3591.07
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 104
model name      : AMD Athlon(tm) 64 X2 Dual-Core Processor TK-55
stepping        : 1
cpu MHz         : 1800.000
cache size      : 256 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall lbrv
bugs            : apic_c1e fxsave_leak sysret_ss_attrs null_seg swapgs_fence amd_e400 spectre_v1 spectre_v2
bogomips        : 3591.07
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

Here is what I understood of this bug.
The kernel sgemm_kernel_8x4_sse allocate an additional stack space at some point.
https://github.com/xianyi/OpenBLAS/blob/4741ce803bd13acb4ff0ff1cf57f7a64cf7ef77c/kernel/x86_64/gemm_kernel_8x4_sse.S#L385-L387
The 128 bytes are used for some variables, while the LOCAL_BUFFER_SIZE is just enough space to be used in a loop later on.
https://github.com/xianyi/OpenBLAS/blob/4741ce803bd13acb4ff0ff1cf57f7a64cf7ef77c/kernel/x86_64/gemm_kernel_8x4_sse.S#L419-L478
But the beginning of the buffer used in the loop is defined to start 256 bytes into the stack.
https://github.com/xianyi/OpenBLAS/blob/4741ce803bd13acb4ff0ff1cf57f7a64cf7ef77c/kernel/x86_64/gemm_kernel_8x4_sse.S#L84

As a result, the loop overwrite the old stack frame, including the saved registers. This makes the program crash later on.

The fix should be pretty straightforward: allocate $256 + LOCAL_BUFFER_SIZE bytes on the stack. I tried and it works.

I guess this bug has been hiding there for near a decade. I also think there's the very same bug in gemm_kernel_4x8_nano.S. But I can't test it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions