convert : improve model arch handling #13122

ngxson · 2025-04-26T09:28:18Z

This improves the case where architectures is defined in both vision_config and text_config. For example:

{
  "architectures": [
    "InternVLChatModel"
  ],
  "text_config": {
    "architectures": [
      "Qwen2ForCausalLM"
    ],
    ...
  },
  "vision_config": {
    "architectures": [
      "InternVisionModel"
    ],
    ...
  },
  ...
}

The arch will be mapped correctly:

ModelType.TEXT --> Qwen2ForCausalLM
ModelType.VISION --> InternVisionModel

In case the arch in the sub-config is missing, we simply fallback to the arch in the root-level config. Example:

{
  "architectures": [
    "SmolVLMForConditionalGeneration"
  ],
  "text_config": {
    "architectures": [
      "VLlama3ForCausalLM"
    ],
    ...
  },
  "vision_config": {
    ...
  },
  ...
}

ModelType.TEXT --> Qwen2ForCausalLM
ModelType.VISION --> SmolVLMForConditionalGeneration

This is also the same case where "architectures": null

compilade · 2025-04-26T11:59:31Z

convert_hf_to_gguf.py

@@ -1078,8 +1081,12 @@ def __init__(self, *args, **kwargs):
            raise TypeError("VisionModel must be subclassed with model_arch = gguf.MODEL_ARCH.CLIP_VISION")

        # small hack to correct the number of layers
-        self.tensor_map = gguf.get_tensor_name_map(gguf.MODEL_ARCH.CLIP_VISION, 128)
-        self.n_embd_text = self.find_hparam(["hidden_size", "n_embd"])
+        self.block_count = 512 # vision models are small, this "ought to be enough for anybody"


"small" with up to 512 layers :)
What is the max you've seen in the wild?

I think this number could be taken from the config, since vision_config seems to contain num_hidden_layers at least for Llama-3.2-11B-Vision. (there's also num_global_layers, I guess the max of the layer counts should be used)

What model doesn't specify how many layers the vision part has?

I never seen any model having more than 64 layers, but I'm just putting this number here for future proof.

We could get this number from the config, but the problem is that many config.json nowadays misses that number, as transformers library omit it if it's the same as default value. For example, this, this and this where you cannot find num_hidden_layers in vision_config

The frustrating thing is that this start to happen on some text_config too.

One way to fix this could be to use AutoConfig, but this won't work on models not transformers library. While I'm pretty sure this kind of model is rare nowadays, I can't know for sure if people still using it. WDYT?

One way to fix this could be to use AutoConfig,

That could be the way to go, since convert_hf_to_gguf.py already has hf in the name and mostly already expects HF-like models which are supported by transformers. I don't know how much it would change how hparams is used in set_gguf_parameters, though.

but this won't work on models not transformers library. While I'm pretty sure this kind of model is rare nowadays, I can't know for sure if people still using it. WDYT?

I guess if this is a problem, (e.g. for very new architectures), it could always be possible to temporarily define a PreTrainedConfig and use AutoConfig.register.

But! Something which seems important is that AutoConfig uses the model_type field instead of archictectures field, which may change the assumptions in a bunch of places. I'm not sure if it would really be compatible with the idea of using sub-architectures like in this PR.

I guess it's probably fine to keep a high block count, but it makes the tensor map dict bigger than it needs to be.

I don't know how much it would change how hparams is used in set_gguf_parameters, though.

It should not change much, AutoConfig has a to_dict() function which basically returns the same config.json, but will all hparams pre-filled.

The simple plan is to replace load_hparams from open(...) to AutoConfig.from_pretrained(dir_model).to_dict()

Here is what I mean: 4840d2f

It does work better now, num_hidden_layers is no longer missing. However, for smolvlm, some configs are still missing entirely in the to_dict(), like num_attention_heads or hidden_size. Though I think it's not very important for now. Alternative way, we can get them from AutoConfig object before to_dict()

compilade · 2025-04-26T15:38:55Z

convert_hf_to_gguf.py

-                hparams["architectures"] = architectures
-            return hparams
+        try:
+            return AutoConfig.from_pretrained(dir_model, trust_remote_code=True).to_dict()


Suggested change

return AutoConfig.from_pretrained(dir_model, trust_remote_code=True).to_dict()

return AutoConfig.from_pretrained(dir_model, trust_remote_code=False).to_dict()

I don't think trust_remote_code=True should be the default here.

If a model uses a custom module, then hopefully it also has the relevant information in config.json (and/or we can assume some defaults as usual).

I would prefer to avoid running arbitrary code from the config of the models.

(Rethinking about this, and maybe removing that trust would (partially) defeat the purpose of using AutoConfig...)

The reason why it left that trust_remote_code=True was because we already had it in some places in the code, mostly to load the tokenizer. But on second thought, yeah this can be a huge security risk. Basically any bad actors can just trick user to try their model, and they can add a command execution inside.

I think the fallback to config.json should work well for now, given that not many models need trust_remote_code

One thing I'm a bit concern though, should we guard all other places with trust_remote_code=True behind a flag, like for example --trust-remote-code? This is good for performance, but can be quite a bad UX for people don't know why it's needed

convert_hf_to_gguf.py

ngxson · 2025-04-30T14:56:09Z

Hey @compilade , as I'm quite blocked by this PR, I'll merge it right now.

If something does not look right to you, feel free to leave a comment. I'll be happy to make a follow up PR to fix things. Thanks!

* origin/master: sync : ggml whisper : add check that target name exists (whisper/3103) ggml : suppress Windows compiler warnings (whisper/3075) mtmd : add **vision** support for Mistral Small 3.1 (ggml-org#13231) arg : remove CURLINFO_EFFECTIVE_METHOD (ggml-org#13228) llama-model : fix the reported size class for nomic-embed-text-v2-moe (ggml-org#13223) sync : ggml ggml : fix ggml_gallocr_ptr type (ggml/1205) cuda : fix unused variable compile warning (whisper/0) CUDA: batched+noncont MMQ, refactor bs>1 MoE code (ggml-org#13199) arg : -hf do not fail if url mismatch (ggml-org#13219) fix typo: `n_ctx_pre_seq` -> `n_ctx_per_seq` (ggml-org#13221) convert : improve model arch handling (ggml-org#13122) llava : remove duplicate include (ggml-org#13207) common : add -jf / --json-schema-file flag (ggml-org#12011)

compilade · 2025-05-01T20:19:10Z

@ngxson

It seems like using AutoConfig breaks conversion for Mamba2 from the non-hf repos (e.g. https://huggingface.co/state-spaces/mamba2-370m). The defaults seem to come from Mamba-Codestral-7B, and this means the suggested step in #9126 to simply add "architectures": ["Mamba2ForCausalLM"], in config.json no longer works (at least when updating that branch to latest master).

Even n_layer from config.json is ignored, since n_layers and num_hidden_layers are checked first and AutoConfig has a default value which differs from what the actual model has.

ngxson · 2025-05-01T21:12:18Z

convert_hf_to_gguf.py

+def get_model_architecture(dir_model: Path, model_type: ModelType, hparams: Any = None) -> str:
+    hparams = ModelBase.load_hparams(dir_model) if hparams is None else hparams
+    text_config = hparams.get("text_config", {})
+    vision_config = hparams.get("vision_config", {})
+    arch = hparams["architectures"][0]
+    # if "architectures" is found in the sub-config, use that instead
+    if model_type == ModelType.TEXT and text_config.get("architectures") is not None:
+        arch = text_config["architectures"][0]
+    elif model_type == ModelType.VISION and vision_config.get("architectures") is not None:
+        arch = vision_config["architectures"][0]
+    return arch


@compilade In this case, maybe we can patch get_model_architecture to introduce an exception for Mamba? (so that you no longer need to manually add the "architectures" to config.json)

Maybe something like

if "ssm_cfg" in hparams and hparams.get("ssm_cfg").get("layer") == "Mamba": return "MambaForCausalLM"

And since "architectures" is not present in config.json, the AutoConfig will throw an error, which will trigger old method

@ngxson I've tried this, and for some reason it doesn't work; it seems like AutoConfig doesn't fail when there is no architectures field. It still uses wrong default values for what Mamba2 needs.

The error I'm getting suggests hparams.n_groups defaults to 8, and hparams.intermediate_size defaults to 8192, which are the values for Mamba-Codestral-7B-v0.1, not the Mamba2-370m model I'm actually converting.

It's as if AutoConfig can still detect it's Mamba2, but doesn't use the correct values from the config.

Here's what I've tried

diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py index 532cc879d..4a3713e4d 100755 --- a/convert_hf_to_gguf.py +++ b/convert_hf_to_gguf.py @@ -5968,12 +5968,21 @@ def get_model_architecture(dir_model: Path, model_type: ModelType, hparams: Any hparams = ModelBase.load_hparams(dir_model) if hparams is None else hparams text_config = hparams.get("text_config", {}) vision_config = hparams.get("vision_config", {}) - arch = hparams["architectures"][0] + arch = None + if (arches := hparams.get("architectures")) is not None and len(arches) > 0: + arch = arches[0] + elif "ssm_cfg" in hparams: + # TODO: more general extra mappings + ssm_mapping = {"Mamba": "MambaForCausalLM", "Mamba2": "Mamba2ForCausalLM"} + arch = ssm_mapping.get(hparams["ssm_cfg"].get("layer", "Mamba"), None) + # if "architectures" is found in the sub-config, use that instead if model_type == ModelType.TEXT and text_config.get("architectures") is not None: arch = text_config["architectures"][0] elif model_type == ModelType.VISION and vision_config.get("architectures") is not None: arch = vision_config["architectures"][0] + if arch is None: + raise ValueError("Failed to detect model architecture") return arch

And then converting https://huggingface.co/state-spaces/mamba2-370m.

(when using #9126)

It's as if AutoConfig can still detect it's Mamba2, but doesn't use the correct values from the config.

Hmm this is quite magic tbh, I have no idea how AutoConfig works under the hood.

Another solution though, we can add an argument in load_hparams, let's say use_raw_config: bool. Then in the __init__, you can rewrite the self.hparams = load_hparams(..., use_raw_config=True)

convert : improve model arch handling

e8b00ed

ngxson requested a review from compilade April 26, 2025 09:28

github-actions bot added the python python script changes label Apr 26, 2025

compilade reviewed Apr 26, 2025

View reviewed changes

use AutoConfig

4840d2f

compilade reviewed Apr 26, 2025

View reviewed changes

rm trust_remote_code

d11dccb

ngxson requested a review from compilade April 27, 2025 13:26

Merge branch 'master' into xsn/convert_improve_arch_handling

c3dbfb6

ngxson commented Apr 30, 2025

View reviewed changes

convert_hf_to_gguf.py Show resolved Hide resolved

ngxson and others added 3 commits April 30, 2025 11:50

Update convert_hf_to_gguf.py

e5c5fd7

fix self.block_count for vision

1a0485d

fix NomicBertModel

a21e755

ggerganov approved these changes Apr 30, 2025

View reviewed changes

ngxson merged commit 3e168be into ggml-org:master Apr 30, 2025
5 checks passed

ngxson commented May 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert : improve model arch handling #13122

convert : improve model arch handling #13122

ngxson commented Apr 26, 2025 •

edited

Loading

compilade Apr 26, 2025

ngxson Apr 26, 2025 •

edited

Loading

compilade Apr 26, 2025 •

edited

Loading

ngxson Apr 26, 2025

ngxson Apr 26, 2025 •

edited

Loading

compilade Apr 26, 2025 •

edited

Loading

ngxson Apr 26, 2025

ngxson commented Apr 30, 2025

compilade commented May 1, 2025 •

edited

Loading

ngxson May 1, 2025 •

edited

Loading

compilade May 2, 2025 •

edited

Loading

ngxson May 2, 2025

	return AutoConfig.from_pretrained(dir_model, trust_remote_code=True).to_dict()
	return AutoConfig.from_pretrained(dir_model, trust_remote_code=False).to_dict()

convert : improve model arch handling #13122

convert : improve model arch handling #13122

Conversation

ngxson commented Apr 26, 2025 • edited Loading

compilade Apr 26, 2025

Choose a reason for hiding this comment

ngxson Apr 26, 2025 • edited Loading

Choose a reason for hiding this comment

compilade Apr 26, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson Apr 26, 2025

Choose a reason for hiding this comment

ngxson Apr 26, 2025 • edited Loading

Choose a reason for hiding this comment

compilade Apr 26, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson Apr 26, 2025

Choose a reason for hiding this comment

ngxson commented Apr 30, 2025

compilade commented May 1, 2025 • edited Loading

ngxson May 1, 2025 • edited Loading

Choose a reason for hiding this comment

compilade May 2, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson May 2, 2025

Choose a reason for hiding this comment

ngxson commented Apr 26, 2025 •

edited

Loading

ngxson Apr 26, 2025 •

edited

Loading

compilade Apr 26, 2025 •

edited

Loading

ngxson Apr 26, 2025 •

edited

Loading

compilade Apr 26, 2025 •

edited

Loading

compilade commented May 1, 2025 •

edited

Loading

ngxson May 1, 2025 •

edited

Loading

compilade May 2, 2025 •

edited

Loading