Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
convert : improve model arch handling #13122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert : improve model arch handling #13122
Changes from all commits
e8b00ed
4840d2f
d11dccb
c3dbfb6
e5c5fd7
1a0485d
a21e755
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@compilade In this case, maybe we can patch
get_model_architecture
to introduce an exception for Mamba? (so that you no longer need to manually add the"architectures"
to config.json)Maybe something like
And since
"architectures"
is not present in config.json, theAutoConfig
will throw an error, which will trigger old methodThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngxson I've tried this, and for some reason it doesn't work; it seems like AutoConfig doesn't fail when there is no
architectures
field. It still uses wrong default values for what Mamba2 needs.The error I'm getting suggests
hparams.n_groups
defaults to 8, andhparams.intermediate_size
defaults to 8192, which are the values for Mamba-Codestral-7B-v0.1, not the Mamba2-370m model I'm actually converting.It's as if AutoConfig can still detect it's Mamba2, but doesn't use the correct values from the config.
Here's what I've tried
And then converting https://huggingface.co/state-spaces/mamba2-370m.
(when using #9126)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is quite magic tbh, I have no idea how AutoConfig works under the hood.
Another solution though, we can add an argument in
load_hparams
, let's sayuse_raw_config: bool
. Then in the__init__
, you can rewrite theself.hparams = load_hparams(..., use_raw_config=True)