Segfault in sgemm_kernel on old x86_64

The single precision gemm kernel consistently crash on an old x86_64 CPU. But only with large enough matrices. Like 1000*1000.

Here is the simplest program I could write that shows it.
```C
#include <stdlib.h>
#include <cblas.h>

#define SIZE 1000

int main(void) {
	float A[SIZE * SIZE];
	float C[SIZE * SIZE];
	cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, SIZE, SIZE, SIZE, 1, A, SIZE, A, SIZE, 0, C, SIZE);
	return 0;
}
```

Weirdly enough, it doesn't crash for a single thread.
```
$ gcc -o sgemm_test sgemm_test.c -lopenblas
$ ./sgemm_test
zsh: segmentation fault  ./sgemm_test
$ OPENBLAS_NUM_THREADS=1 ./sgemm_test
$ OPENBLAS_NUM_THREADS=2 ./sgemm_test
zsh: segmentation fault  OPENBLAS_NUM_THREADS=2 ./sgemm_test
```

And here is the cpuinfo.
```
$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 104
model name      : AMD Athlon(tm) 64 X2 Dual-Core Processor TK-55
stepping        : 1
cpu MHz         : 1800.000
cache size      : 256 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall lbrv
bugs            : apic_c1e fxsave_leak sysret_ss_attrs null_seg swapgs_fence amd_e400 spectre_v1 spectre_v2
bogomips        : 3591.07
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 104
model name      : AMD Athlon(tm) 64 X2 Dual-Core Processor TK-55
stepping        : 1
cpu MHz         : 1800.000
cache size      : 256 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall lbrv
bugs            : apic_c1e fxsave_leak sysret_ss_attrs null_seg swapgs_fence amd_e400 spectre_v1 spectre_v2
bogomips        : 3591.07
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps
```

Here is what I understood of this bug.
The kernel `sgemm_kernel_8x4_sse` allocate an additional stack space at some point.
https://github.com/xianyi/OpenBLAS/blob/4741ce803bd13acb4ff0ff1cf57f7a64cf7ef77c/kernel/x86_64/gemm_kernel_8x4_sse.S#L385-L387
The 128 bytes are used for some variables, while the `LOCAL_BUFFER_SIZE` is just enough space to be used in a loop later on.
https://github.com/xianyi/OpenBLAS/blob/4741ce803bd13acb4ff0ff1cf57f7a64cf7ef77c/kernel/x86_64/gemm_kernel_8x4_sse.S#L419-L478
But the beginning of the buffer used in the loop is defined to start 256 bytes into the stack.
https://github.com/xianyi/OpenBLAS/blob/4741ce803bd13acb4ff0ff1cf57f7a64cf7ef77c/kernel/x86_64/gemm_kernel_8x4_sse.S#L84

As a result, the loop overwrite the old stack frame, including the saved registers. This makes the program crash later on.

The fix should be pretty straightforward: allocate `$256 + LOCAL_BUFFER_SIZE` bytes on the stack. I tried and it works.

I guess this bug has been hiding there for near a decade. I also think there's the very same bug in `gemm_kernel_4x8_nano.S`. But I can't test it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segfault in sgemm_kernel on old x86_64 #2047

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segfault in sgemm_kernel on old x86_64 #2047

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions