Skip to content

save_load_state example segfaulting after adding Metal inference #1737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JohannesGaessler opened this issue Jun 7, 2023 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@JohannesGaessler
Copy link
Collaborator

Expected Behavior

The example saves and loads a state.

Current Behavior

The example crashes with a segmentation fault.

Environment and Context

According to git bisect the first commit that causes a segmentation fault is master-ecb-217d, the one where Metal inference was added.

Hardware:

  • Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu

  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 3700X 8-Core Processor
    CPU family:          23
    Model:               113
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            0
    Frequency boost:     enabled
    CPU(s) scaling MHz:  77%
    CPU max MHz:         4935.9370
    CPU min MHz:         2200.0000
    BogoMIPS:            7202.09
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr
                         _opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3
                          fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalign
                         sse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pst
                         ate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsav
                         ec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock
                          nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umi
                         p rdpid overflow_recov succor smca sev sev_es
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    4 MiB (8 instances)
  L3:                    32 MiB (2 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Mitigation; untrained return thunk; SMT enabled with STIBP protection
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
  • Operating System, e.g. for Linux:

$ uname -a
Linux johannes-pc 6.3.0-1-MANJARO #1 SMP PREEMPT_DYNAMIC Mon Apr 3 10:46:56 UTC 2023 x86_64 GNU/Linux

  • SDK version, e.g. for Linux:
Python 3.10.10
GNU Make 4.4.1
g++ (GCC) 12.2.1 20230201

Steps to Reproduce

git checkout master-ecb217d
make clean && make save-load-state
./save-load-state --model path/to/model.bin

Failure Logs

The GDB output for the segfault:

Thread 1 "save-load-state" received signal SIGSEGV, Segmentation fault.
0x000055555556e5fd in ggml_view_3d (ctx=0x55555569da68 <g_state+200>, a=0x7ffb83bff030, ne0=6656, ne1=6, ne2=60, nb1=13312, nb2=6815744, offset=0)
    at ggml.c:5901
5901        memcpy(offs->data, &offset, 2*sizeof(int32_t));
(gdb) bt
#0  0x000055555556e5fd in ggml_view_3d (ctx=0x55555569da68 <g_state+200>, a=0x7ffb83bff030, ne0=6656, ne1=6, ne2=60, nb1=13312, nb2=6815744, 
    offset=0) at ggml.c:5901
#1  0x000055555559b73e in llama_copy_state_data (ctx=0x5555556b22c0, dst=0x7ffac15c9010 ":\032") at llama.cpp:2751
#2  0x000055555555afa7 in main (argc=3, argv=0x7fffffffd778) at examples/save-load-state/save-load-state.cpp:59
@JohannesGaessler JohannesGaessler added the bug Something isn't working label Jun 7, 2023
@JohannesGaessler
Copy link
Collaborator Author

Sorry, I forgot to check whether there already is an open issue. The bug has already been reported in #1699

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant