Description
Hi,
I'm getting an Illegal Instruction signal when using openblot via numpy. Here's the call stack:
#0 0x0000000804921e9c in dcopy_k_CORE2 () from /usr/local/lib/libopenblas.so.0
#1 0x000000082b4d80b6 in cauchy (n=5713, x=..., l=..., u=..., nbd=..., g=...,
iorder=..., iwhere=..., t=..., d=..., xcp=..., m=10, wy=..., ws=..., sy=...,
wt=..., theta=1, col=0, head=1, p=..., c=..., wbp=..., v=..., nseg=0,
iprint=-1, sbgnrm=0.16177334220291192, info=0, epsmch=2.2204460492503131e-16)
at scipy/optimize/lbfgsb_src/lbfgsb.f:1507
#2 0x000000082b4dbb04 in mainlb (n=5713, m=10, x=..., l=..., u=..., nbd=...,
f=0.48867889557385025, g=..., factr=10000000, pgtol=0.0001, ws=..., wy=...,
sy=..., ss=..., wt=..., wn=..., snd=..., z=..., r=..., d=..., t=..., xp=...,
wa=..., index=..., iwhere=..., indx2=..., task=..., iprint=-1, csave=...,
lsave=..., isave=..., dsave=..., maxls=20, _task=60, _csave=60)
at scipy/optimize/lbfgsb_src/lbfgsb.f:669
#3 0x000000082b4dd70b in setulb (n=5713, m=10, x=..., l=..., u=..., nbd=...,
f=0.48867889557385025, g=..., factr=10000000, pgtol=0.0001, wa=..., iwa=...,
task=..., iprint=-1, csave=..., lsave=..., isave=..., dsave=..., maxls=20,
_task=60, _csave=60) at scipy/optimize/lbfgsb_src/lbfgsb.f:273
#4 0x000000082b4c98c2 in f2py_rout.lbfgsb_setulb ()
from /home/scott/remote-execution/lib/python3.7/site-packages/scipy/optimize/_lbfgsb.so
#5 0x0000000800364242 in _PyObject_FastCallKeywords ()
from /usr/local/lib/libpython3.7m.so.1.0
#6 0x0000000800424ddb in ?? () from /usr/local/lib/libpython3.7m.so.1.0
#7 0x000000080042204c in _PyEval_EvalFrameDefault ()
from /usr/local/lib/libpython3.7m.so.1.0
...
I've reproed this with OpenBLAS versions 0.3.15 and a couple of previous older builds. I've tried various TARGETs and can't find any that work.
The CPU I'm running on is this. It's old. Is that the problem?
eax in eax ebx ecx edx
00000000 0000000a 756e6547 6c65746e 49656e69
00000001 000006fb 01040800 0000e3bd bfebfbff
00000002 05b0b101 005657f0 00000000 2cb43049
00000003 00000000 00000000 00000000 00000000
00000004 0c000121 01c0003f 0000003f 00000001
00000005 00000040 00000040 00000003 00000020
00000006 00000001 00000002 00000001 00000000
00000007 00000000 00000000 00000000 00000000
00000008 00000400 00000000 00000000 00000000
00000009 00000000 00000000 00000000 00000000
0000000a 07280202 00000000 00000000 00000503
80000000 80000008 00000000 00000000 00000000
80000001 00000000 00000000 00000001 20100800
80000002 65746e49 2952286c 726f4320 4d542865
80000003 51203229 20646175 20555043 51202020
80000004 30303636 20402020 30342e32 007a4847
80000005 00000000 00000000 00000000 00000000
80000006 00000000 00000000 10008040 00000000
80000007 00000000 00000000 00000000 00000000
80000008 00003024 00000000 00000000 00000000
Vendor ID: "GenuineIntel"; CPUID level 10
Intel-specific functions:
Version 000006fb:
Type 0 - Original OEM
Family 6 - Pentium Pro
Model 15 - Intel Core2 family processor, 65nm
Stepping 11
Reserved 0
Extended brand string: "Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz"
CLFLUSH instruction cache line size: 8
Initial APIC ID: 1
Hyper threading siblings: 4
Feature flags set 1 (CPUID.01H:EDX): bfebfbff:
FPU Floating Point Unit
VME Virtual 8086 Mode Enhancements
DE Debugging Extensions
PSE Page Size Extensions
TSC Time Stamp Counter
MSR Model Specific Registers
PAE Physical Address Extension
MCE Machine Check Exception
CX8 COMPXCHG8B Instruction
APIC On-chip Advanced Programmable Interrupt Controller present and enabled
SEP Fast System Call
MTRR Memory Type Range Registers
PGE PTE Global Flag
MCA Machine Check Architecture
CMOV Conditional Move and Compare Instructions
FGPAT Page Attribute Table
PSE-36 36-bit Page Size Extension
CLFSH CFLUSH instruction
DS Debug store
ACPI Thermal Monitor and Clock Ctrl
MMX MMX instruction set
FXSR Fast FP/MMX Streaming SIMD Extensions save/restore
SSE Streaming SIMD Extensions instruction set
SSE2 SSE2 extensions
SS Self Snoop
HT Hyper Threading
TM Thermal monitor
31 Pending Break Enable
Feature flags set 2 (CPUID.01H:ECX): 0000e3bd:
SSE3 SSE3 extensions
DTES64 64-bit debug store
MONITOR MONITOR/MWAIT instructions
DS-CPL CPL Qualified Debug Store
VMX Virtual Machine Extensions
EST Enhanced Intel SpeedStep Technology
TM2 Thermal Monitor 2
SSSE3 Supplemental Streaming SIMD Extension 3
CX16 CMPXCHG16B
xTPR Send Task Priority messages
PDCM Perfmon and debug capability
Extended feature flags set 1 (CPUID.80000001H:EDX): 20100800
SYSCALL SYSCALL/SYSRET instructions
XD-bit Execution Disable bit
EM64T Intel Extended Memory 64 Technology
Extended feature flags set 2 (CPUID.80000001H:ECX): 00000001
LAHF LAHF/SAHF available in IA-32e mode
Old-styled TLB and cache info:
b1: Instruction TLB: 2MB Pages (8 entries) or 4MB pages (4 entries), 4-way set associative
b0: Instruction TLB: 4-KB Pages, 4-way set associative, 128 entries
05: Data TLB: 4MB pages, 4-way set assoc, 32 entries
f0: 64-byte prefetching
57: Data TLB: 4KB pages, 4-way set associative, 16 entries
56: Data TLB: 4MB pages, 4-way set associative, 16 entries
49: 3rd-level cache: 4MB, 16-way set associative, 64-byte line size (Intel Xeon MP, Family 0Fh, Model 06h
OR 2nd-level cache: 4MB, 16-way set associative, 64-byte line size
30: 1st-level instruction cache: 32-KB, 8-way set associative, 64-byte line size
b4: Data TLB: 4-KB Pages, 4-way set associative, 256 entries
2c: 1st-level data cache: 32-KB, 8-way set associative, 64-byte line size
Deterministic Cache Parameters:
index=0: eax=0c000121 ebx=01c0003f ecx=0000003f edx=00000001
Data cache, level 1, self initializing
64 sets, 8 ways, 1 partitions, line size 64
full size 32768 bytes
NB this package has up to 4 threads
index=1: eax=0c000122 ebx=01c0003f ecx=0000003f edx=00000001
Instruction cache, level 1, self initializing
64 sets, 8 ways, 1 partitions, line size 64
full size 32768 bytes
index=2: eax=0c004143 ebx=03c0003f ecx=00000fff edx=00000001
Unified cache, level 2, self initializing
4096 sets, 16 ways, 1 partitions, line size 64
full size 4194304 bytes
shared between up to 2 threads
Here's the disassembled code:
Dump of assembler code for function dcopy_k_CORE2:
0x0000000804921e00 <+0>: lea rdx,[rdx8+0x0]
0x0000000804921e08 <+8>: lea r8,[r88+0x0]
0x0000000804921e10 <+16>: cmp rdx,0x8
0x0000000804921e14 <+20>: jne 0x8049221a0 <dcopy_k_CORE2+928>
0x0000000804921e1a <+26>: cmp r8,0x8
0x0000000804921e1e <+30>: jne 0x8049221a0 <dcopy_k_CORE2+928>
0x0000000804921e24 <+36>: test rcx,0x8
0x0000000804921e2b <+43>: je 0x804921e50 <dcopy_k_CORE2+80>
0x0000000804921e31 <+49>: movsd xmm0,QWORD PTR [rsi]
0x0000000804921e35 <+53>: movsd QWORD PTR [rcx],xmm0
0x0000000804921e39 <+57>: add rsi,0x8
0x0000000804921e3d <+61>: add rcx,0x8
0x0000000804921e41 <+65>: dec rdi
0x0000000804921e44 <+68>: jle 0x804921fb0 <dcopy_k_CORE2+432>
0x0000000804921e4a <+74>: nop WORD PTR [rax+rax*1+0x0]
0x0000000804921e50 <+80>: sub rsi,0xffffffffffffff80
0x0000000804921e54 <+84>: sub rcx,0xffffffffffffff80
0x0000000804921e58 <+88>: test rsi,0x8
0x0000000804921e5f <+95>: jne 0x804921fb8 <dcopy_k_CORE2+440>
0x0000000804921e65 <+101>: mov rax,rdi
0x0000000804921e68 <+104>: sar rax,0x4
0x0000000804921e6c <+108>: jle 0x804921f10 <dcopy_k_CORE2+272>
0x0000000804921e72 <+114>: movups xmm0,XMMWORD PTR [rsi-0x80]
0x0000000804921e76 <+118>: movups xmm1,XMMWORD PTR [rsi-0x70]
0x0000000804921e7a <+122>: movups xmm2,XMMWORD PTR [rsi-0x60]
0x0000000804921e7e <+126>: movups xmm3,XMMWORD PTR [rsi-0x50]
0x0000000804921e82 <+130>: movups xmm4,XMMWORD PTR [rsi-0x40]
0x0000000804921e86 <+134>: movups xmm5,XMMWORD PTR [rsi-0x30]
0x0000000804921e8a <+138>: movups xmm6,XMMWORD PTR [rsi-0x20]
0x0000000804921e8e <+142>: movups xmm7,XMMWORD PTR [rsi-0x10]
0x0000000804921e92 <+146>: dec rax
0x0000000804921e95 <+149>: jle 0x804921ee8 <dcopy_k_CORE2+232>
0x0000000804921e97 <+151>: nop
0x0000000804921e98 <+152>: movups XMMWORD PTR [rcx-0x80],xmm0
=> 0x0000000804921e9c <+156>: movups xmm0,XMMWORD PTR [rsi]
In case it matters, numpy caller is 1.20.3 and python is 3.7.10. My repro is not simple, unfortunately, but is consistent. Happy to provide it if anyone is interested.
Is this a known issue?
Thx!