Closed
Description
llm_load_tensors: ggml ctx size = 0.24 MiB
llm_load_tensors: offloading 4 repeating layers to GPU
llm_load_tensors: offloaded 4/33 layers to GPU
llm_load_tensors: CPU buffer size = 2918.26 MiB
llm_load_tensors: OpenCL buffer size = 324.19 MiB
......................................................................................
Automatic RoPE Scaling: Using (scale:1.000, base:10000.0).
llama_new_context_with_model: n_ctx = 8288
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 3108.00 MiB
llama_new_context_with_model: KV self size = 3108.00 MiB, K (f16): 1554.00 MiB, V (f16): 1554.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.12 MiB
llama_new_context_with_model: CPU compute buffer size = 570.19 MiB
llama_new_context_with_model: graph nodes = 1286
llama_new_context_with_model: graph splits = 1
Traceback (most recent call last):
File "koboldcpp.py", line 3783, in <module>
File "koboldcpp.py", line 3445, in main
File "koboldcpp.py", line 444, in load_model
OSError: exception: access violation reading 0x000000000510D000
[14532] Failed to execute script 'koboldcpp' due to unhandled exception!
but if I don't use opencl it works.
Metadata
Metadata
Assignees
Labels
No labels