-
Notifications
You must be signed in to change notification settings - Fork 29.7k
Add support for MiniMax's MiniMax-Text-01 #35831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @geetu040, this looks quite good! You can ping us whenever it's ready for review. Also, code quality issues in the CI can be fixed with |
This reverts commit d8d3c40.
Eager to see this! 🔥 Feel free to ping @Cyrilvallez and me for a review! |
cc @Cyrilvallez can you have a look! |
Do you want me to commit directly to help get this merged? |
I tested the MiniMax model by using this code. Prompt is "Hello! How are you?"
the output is garbled, for example:
However, if I switch to:
then the output is correct, such as:
It seems like there's something wrong when using MiniMaxForCausalLM.from_pretrained. |
@qscqesze Thanks for testing this out! This could be happening because of minor difference in naming the config attributes can you trying doing this with the config config = AutoConfig.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
)
config = config.to_dict()
config['full_attn_alpha_factor'] = config['layernorm_full_attention_alpha']
config['full_attn_beta_factor'] = config['layernorm_full_attention_beta']
config['linear_attn_alpha_factor'] = config['layernorm_linear_attention_alpha']
config['linear_attn_beta_factor'] = config['layernorm_linear_attention_beta']
config['mlp_alpha_factor'] = config['layernorm_mlp_alpha']
config['mlp_beta_factor'] = config['layernorm_mlp_beta']
config = MiniMaxConfig(**config)
quantized_model = MiniMaxForCausalLM.from_pretrained(
MODEL_PATH,
config=config,
torch_dtype="bfloat16",
device_map="auto",
quantization_config=quantization_config,
attn_implementation="flash_attention_2",
trust_remote_code=True,
) Also, don't use the code from latest commit of this branch, I am fixing some bugs here, please use the previous commit 9c5397d code |
Thanks for your code—everything works fine on my side now. |
@ArthurZucker please review the PR, if everything looks fine, we can move to handling the checkpoints on huggingface. |
Hello. I'm curious—why do we need to change attn_type_list to layer_type_list? I checked the config file at config.json, and it actually doesn't contain a layer_type_list configuration. When I use the latest code, I get the following error:
However, if I modify it like this:
It works fine. |
@qscqesze yes, some of the config attributes are named differently here in transformers as suggested at #35831 (comment) and #35831 (comment). We renamed these for better clarify and to avoid confusion by names. Are you fine with the changes, or would prefer to have the original names or suggest a better one? CC: @ArthurZucker @Cyrilvallez (could we rename them back?) |
I understand the reasoning behind renaming these attributes for better clarity, and I appreciate the effort to make things more consistent. However, one drawback is that users who download the config files directly from Hugging Face might encounter issues if they’re unaware of the renaming—since the current setup requires them to manually remap the keys to make the model work. This could easily lead to confusion, especially for users who aren’t closely following the codebase changes. I’d love to hear what other reviewers think about this as well. |
@qscqesze We actually handle this by creating another repository on huggingface, in this case something like |
@qscqesze when you use The reason we need to rename is to adhere to the standard:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My last comment then let's merge! cc @LysandreJik to merge once this is fixed
@ArthurZucker @geetu040 Thank you both! |
Hi @LysandreJik, This PR is ready for merge! These checkpoints for integration tests should also be moved from |
Hey @geetu040, I mirrored your tiny checkpoint here: https://huggingface.co/hf-internal-testing/MiniMax-tiny/tree/main I'm merging the PR! Thanks for your contribution 🤗 |
* end-to-end architecture * lightning-attn: refactor, clean, optimize * put minimax_text_01 in other files * use latest __init__ standards and auto-generate modular * support attention_mask for lightning-attn * Revert "use latest __init__ standards and auto-generate modular" This reverts commit d8d3c40. * fix modular conversion * pass both attention masks instead of tuple * formatting * Updated Dynamic Cache * created MiniMaxText01Cache * fix hardcoded slope_rate * update attn_type_list in config * fix lightning when use_cache=False * copy tests from mixtral * (checkpoint) all tests pass for normal attention * fix all unittests * fix import sorting * fix consistency and formatting tests * fix config * update tests, since changes in main * fix seq_len error * create dummy docs * fix checkpoint * add checkpoint in config docstring * run modular_conversion * update docs * fix checkpoint path and update tests * fix ruff * remove repeated expected_slice * update docs * rename "minimax-text-01" to "minimax" * inherit config from mixtral * remove from docs in other languages * undo files that should be untouched * move minimax to end in conversation docs * use MiniMaxForCausalLM as it is * ruff fixes * run modular * fix docstring example in causallm * refactor attention loop and decay factors * refactor config in modular * run modular * refactor cache * rename static_cache to linear_cache * make positional embeddings necessary * remove unnecessary layernorms declarations * fix import in tests * refactor attention in next tokens * remove outdated code * formatting and modular * update tests * rename layernorm alpha/beta factors * register decay factors as buffers * remove unused declarations of decay factors * update config for alpha/beta factors * run modular * remove head_dim in tests * remove minimax from fx.py * remove stuff that is not really needed * update __init__ * update qkv torch.split Co-authored-by: Cyril Vallez <[email protected]> * fix qkv torch.split * quality fixes * remove mistakenly added dummy * purge unused ModelTester code * fix-copies * run fix-copies * fix head_dim * write cache formatting tests * remove postnorm * avoid contiguous in attention current states * update expected_slice * add generation test for integration * fix dtype in generation test * update authors * update with changes in main * update graident checkpointing and minor fixes * fix mutable attn_type_list * rename: attn_type -> layer_type * update for layer_types * update integration tests * update checkpoint * clean overview in docs --------- Co-authored-by: Shakib-IO <[email protected]> Co-authored-by: Cyril Vallez <[email protected]>
|
||
[MiniMaxAI/MiniMax-Text-01-hf](https://huggingface.co/MiniMaxAI/MiniMax-Text-01-hf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will this repo become available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also opened a PR on the hf model repo that might be helpful:
https://huggingface.co/MiniMaxAI/MiniMax-Text-01/discussions/39
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hadipash you should now be able to see the repo at MiniMaxAI/MiniMax-Text-01-hf
What does this PR do?
Fixes #35710
This PR adds MiniMaxAI's MiniMax-Text-01 model to Hugging Face Transformers.
Relevant Links
CC: @MiniMax-AI-Dev
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker, @Rocketknight1
Change Log
GPT2Tokenizer
MixtralConfig
with a few additional parameters:MiniMax-Text-01
architecture matches and uses most of theMixtral
architecture, with a few changes inhidden_states
can be used as residual connections, before or afterlayernorm
is appliedlayer_idx
transformers
To summarize above, the main area of review is the
LightningAttention
implementation.TODOs
LightningAttention
.generate()
method