Feature Request: Granite 4 Support #13275

gabe-l-hart · 2025-05-02T23:07:15Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

This issue is to track work to support IBM's Granite 4 model architecture (GraniteMoEHybrid in transformers). The model uses a number of components that are not yet supported in llama.cpp, but are being worked independently, so I'm raising this issue to triangulate the different work streams that will be needed to support the model.

Necessary Components

Motivation

I lead IBM's efforts to ensure that Granite models work everywhere, and llama.cpp is a critical part of "everywhere!"

Possible Implementation

No response

The text was updated successfully, but these errors were encountered:

gabe-l-hart · 2025-05-02T23:40:52Z

For reference, support PRs in other platforms:

transformers: Add GraniteMoeHybrid support for 4.0 huggingface/transformers#37658
vllm: [Model] Add GraniteMoeHybrid 4.0 model vllm-project/vllm#17497

ngxson · 2025-05-03T15:52:45Z

Support for NoPE positional encoding instead of RoPE

If this is the same idea with llama 4, then I think we already support this. In short, it's just an if condition:

llama.cpp/src/llama-model.cpp

Lines 4536 to 4547 in 3bf785f

    
           if (use_rope) { 
        
               Qcur = ggml_rope_ext( 
        
                       ctx0, Qcur, inp_pos, rope_factors, 
        
                       n_rot, rope_type, n_ctx_orig, freq_base, freq_scale, 
        
                       ext_factor, attn_factor, beta_fast, beta_slow 
        
                       ); 
        
               Kcur = ggml_rope_ext( 
        
                       ctx0, Kcur, inp_pos, rope_factors, 
        
                       n_rot, rope_type, n_ctx_orig, freq_base, freq_scale, 
        
                       ext_factor, attn_factor, beta_fast, beta_slow 
        
                       );

gabe-l-hart added the enhancement New feature or request label May 2, 2025

gabe-l-hart mentioned this issue May 2, 2025

feat: First pass at llama_kv_cache_hybrid #13276

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Granite 4 Support #13275

Feature Request: Granite 4 Support #13275

gabe-l-hart commented May 2, 2025 •

edited

Loading

gabe-l-hart commented May 2, 2025

ngxson commented May 3, 2025 •

edited

Loading

Feature Request: Granite 4 Support #13275

Feature Request: Granite 4 Support #13275

Comments

gabe-l-hart commented May 2, 2025 • edited Loading

Prerequisites

Feature Description

Necessary Components

Motivation

Possible Implementation

gabe-l-hart commented May 2, 2025

ngxson commented May 3, 2025 • edited Loading

gabe-l-hart commented May 2, 2025 •

edited

Loading

ngxson commented May 3, 2025 •

edited

Loading