Skip to content

Add GGUF support for MiniMax-M2.1 model#44526

Draft
JoursBleu wants to merge 1 commit intohuggingface:mainfrom
JoursBleu:feat/gguf-minimax-m2
Draft

Add GGUF support for MiniMax-M2.1 model#44526
JoursBleu wants to merge 1 commit intohuggingface:mainfrom
JoursBleu:feat/gguf-minimax-m2

Conversation

@JoursBleu
Copy link

@JoursBleu JoursBleu commented Mar 8, 2026

What does this PR do?

Add GGUF loading support for MiniMax-M2.1 (456B MoE) model.

MiniMax-M2.1 is a large Mixture-of-Experts model with 456B total parameters (45.9B active), 256 experts and 8 experts per token. This PR enables loading its GGUF-quantized checkpoints (e.g. unsloth/MiniMax-M2.1-GGUF) via from_pretrained(..., gguf_file=...).

Changes

src/transformers/integrations/ggml.py

  • Add "minimax_m2" entry to GGUF_CONFIG_MAPPING with model-specific config fields (including MoE fields: expert_count, expert_used_count, expert_feed_forward_length, expert_gating_func).
  • Convert expert_gating_func integer (from GGUF metadata) to scoring_func string ({0: "none", 1: "softmax", 2: "sigmoid"}).
  • Register GGUFQwen2Converter for minimax_m2 in GGUF_TO_FAST_CONVERTERS (tokenizer is compatible with Qwen2).

src/transformers/models/minimax_m2/modeling_minimax_m2.py

  • Add _checkpoint_conversion_mapping = {"block_sparse_moe": "mlp"} to MiniMaxM2PreTrainedModel. This enables vLLM's gguf_loader revert_hf_rename to convert dummy model parameter names (mlp.*) back to checkpoint-compatible names (block_sparse_moe.*), bridging the naming gap between the transformers native model and the HF safetensors checkpoint without modifying vLLM model code.

src/transformers/modeling_gguf_pytorch_utils.py

  • Add MiniMaxM2TensorProcessor class following the TensorProcessor API introduced in Qwen2/3 MoE + GGUF model support (restored) #42854 (same pattern as Qwen2MoeTensorProcessor):
    • preprocess_name(): strips per-expert indices from HF weight names so that multiple experts can map to one fused GGUF tensor.
    • perform_fallback_tensor_mapping(): maps merged gate_up_proj to both ffn_gate_exps and ffn_up_exps GGUF tensors; maps e_score_correction_bias to exp_probs_b.bias.
    • process(): matches GGUF MoE expert tensors and merges gate+up into gate_up_proj [num_experts, 2*intermediate, hidden].
    • _set_moe_expert_tensor(): merges gate and up weights into the fused gate_up_proj tensor, passes down weights directly.
  • Register processor in TENSOR_PROCESSORS, add model type and architecture mappings.

Testing

Due to the model size (456B parameters, 227GB for Q8_0 GGUF), no CI-compatible unit tests are included. This is consistent with other large MoE models (e.g., Qwen3-30B-A3B in #42854).

Verified end-to-end on 8×AMD W7900D (48GB each) via vLLM serving the Q8_0 GGUF checkpoint:

  • GSM8K 8-shot: 91.5% (official BF16 baseline: 92.0%)
  • MMLU 5-shot: 85.66% (official BF16 baseline: 86.2%)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@SunMarc @MekkCyber @ArthurZucker

@JoursBleu JoursBleu force-pushed the feat/gguf-minimax-m2 branch from 17fcd62 to 455751c Compare March 13, 2026 07:58
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: minimax_m2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant