Skip to content

Misc. bug: llama.swiftui simulator error #12219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danbev opened this issue Mar 6, 2025 · 1 comment
Closed

Misc. bug: llama.swiftui simulator error #12219

danbev opened this issue Mar 6, 2025 · 1 comment

Comments

@danbev
Copy link
Collaborator

danbev commented Mar 6, 2025

Name and Version

$ ./build/bin/llama-cli --version
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M3)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M3)
version: 4830 (d78b4df0)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin23.6.0

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

Command line

Problem description & steps to reproduce

When running the LlamaSwiftUI example in a simulator and trying to load a model, the following error occurs:

ggml_metal_init: loaded kernel_ssm_conv_f32                                   0x0 | th_max =    0 | th_width =    0
ggml_metal_init: error: load pipeline error: Error Domain=CompilerError Code=2 "only 14 constant buffers binding are supported in the simulator but 16 were used" UserInfo={NSLocalizedDescription=only 14 constant buffers binding are supported in the simulator but 16 were used}
ggml_backend_metal_device_init: error: failed to allocate context
llama_init_from_model: failed to initialize Metal backend
Could not load context!
Error: The operation couldn’t be completed. (llama_swiftui.LlamaError error 0.)

This seems to be coming from the kernel_ssm_conv_f32 kernel function:

// ref: ggml.c:ggml_compute_forward_ssm_conv_f32
// TODO: optimize
kernel void kernel_ssm_conv_f32(
        device const  void * src0,
        device const  void * src1,
        device       float * dst,
        constant   int64_t & ne00,
        constant   int64_t & ne01,
        constant   int64_t & ne02,
        constant  uint64_t & nb00,
        constant  uint64_t & nb01,
        constant  uint64_t & nb02,
        constant   int64_t & ne10,
        constant   int64_t & ne11,
        constant  uint64_t & nb10,
        constant  uint64_t & nb11,
        constant   int64_t & ne0,
        constant   int64_t & ne1,
        constant   int64_t & ne2,
        constant  uint64_t & nb0,
        constant  uint64_t & nb1,
        constant  uint64_t & nb2,
        uint3 tgpig[[threadgroup_position_in_grid]],
        uint3 tpitg[[thread_position_in_threadgroup]],
        uint3   ntg[[threads_per_threadgroup]]) {

This kernel function has 18 parameters and 16 are constant buffers. But the limit for
the simulator seems to be 14. I tried simply adding structs for the kernel functions that it complained about and this allowed the example to run (load a model and perform inference).

There is already an open issue and ongoing work to create these structs. I'm just opening this issue for awareness/tracking.

First Bad Commit

No response

Relevant log output

@danbev
Copy link
Collaborator Author

danbev commented Mar 7, 2025

This has been resolve as part of #12194

@danbev danbev closed this as completed Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant