You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
This issue is to track work to support IBM's Granite 4 model architecture (GraniteMoEHybrid in transformers). The model uses a number of components that are not yet supported in llama.cpp, but are being worked independently, so I'm raising this issue to triangulate the different work streams that will be needed to support the model.
Some of the metal backend needs look like they're addressed already in llama : initial Mamba-2 support #9126, but for me that still doesn't work on my M3 (assertion error about non-contiguous data).
Support for NoPE positional encoding instead of RoPE
I haven't fully investigated what is required for this, so it may already work as-is, but putting this here as a placeholder in case further work is needed
End-to-end GraniteMoEHybrid support tying all of the other pieces together
Motivation
I lead IBM's efforts to ensure that Granite models work everywhere, and llama.cpp is a critical part of "everywhere!"
Possible Implementation
No response
The text was updated successfully, but these errors were encountered:
Prerequisites
Feature Description
This issue is to track work to support IBM's Granite 4 model architecture (
GraniteMoEHybrid
intransformers
). The model uses a number of components that are not yet supported inllama.cpp
, but are being worked independently, so I'm raising this issue to triangulate the different work streams that will be needed to support the model.Necessary Components
jamba
by @compilade: llama : support Jamba hybrid Transformer-Mamba models #7531bamba
: Bamba architecture #10810bamba
that's also out-of-date: https://github.com/gabe-l-hart/llama.cpp/tree/BambaArchitectureRefactorGraniteMoEShared
layers: Model: Granite MoE shared #13269mamba2
in non-CPU backendsmetal
backend needs look like they're addressed already in llama : initial Mamba-2 support #9126, but for me that still doesn't work on my M3 (assertion error about non-contiguous data).GraniteMoEHybrid
support tying all of the other pieces togetherMotivation
I lead IBM's efforts to ensure that Granite models work everywhere, and
llama.cpp
is a critical part of "everywhere!"Possible Implementation
No response
The text was updated successfully, but these errors were encountered: