-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[training migration] Migrate mamba builder #4550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 32 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
e48de2a
copy base classes
maanug-nv f561889
copy vocab utils
maanug-nv b03072d
copy distributed wrapping fns
maanug-nv ea9f60a
copy mamba cfg and builder
maanug-nv e7dc746
fix import cycle
maanug-nv e564315
copy base cfg+builder unit tests
maanug-nv b08c7e6
copy dist utils unit tests
maanug-nv ba1d3aa
copy mamba cfg+builder unit tests
maanug-nv 448a7ee
update copyright year
maanug-nv 0a25a81
rename to hybrid
maanug-nv 0a3da47
move to avoid import cycle
maanug-nv 63c0798
match inference spec in build_model with hybrid builders
maanug-nv eedc90e
add helper to build mamba cfg from args
maanug-nv 19f903c
add model cfg to container in pretrain_hybrid
maanug-nv 3d2c86b
refactor to include torch fsdp config
maanug-nv a33d292
refactor bucket size assertions
maanug-nv 11b54b3
mirror last 2 commits in dist utils
maanug-nv 21cd884
mirror ddp param layout refactor (#3812) in dist utils
maanug-nv 042b98e
update tests
maanug-nv 8edc29f
formatting
maanug-nv 996a9b9
fix import
maanug-nv d85c831
re-enable serializable checks
maanug-nv 025e391
fix headers
maanug-nv bfff4cc
update docstring
maanug-nv 8c8d707
formatting
maanug-nv ecdb6ee
handle default spec in builder
maanug-nv 2ddd5a1
defer test to future PR
maanug-nv 9b72a53
remove generation config
maanug-nv ce22fb1
abstractify
maanug-nv 0ffb267
update tests
maanug-nv a7a8402
docstring cleanup
maanug-nv b7c6cc4
Merge branch 'main' into migrate-mamba-builder
maanug-nv 43bb38b
sync with legacy code removal
maanug-nv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| # Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. | ||
|
|
||
| from megatron.training.models.base import ModelBuilder, ModelConfig, Serializable, compose_hooks | ||
| from megatron.training.models.dist_utils import ( | ||
| build_virtual_pipeline_stages, | ||
| unimodal_build_distributed_models, | ||
| ) | ||
| from megatron.training.models.hybrid import HybridModelBuilder, HybridModelConfig | ||
|
|
||
| MambaModelConfig = HybridModelConfig | ||
| MambaModelBuilder = HybridModelBuilder | ||
|
|
||
| __all__ = [ | ||
| "ModelBuilder", | ||
| "ModelConfig", | ||
| "Serializable", | ||
| "compose_hooks", | ||
| "build_virtual_pipeline_stages", | ||
| "unimodal_build_distributed_models", | ||
| "HybridModelConfig", | ||
| "HybridModelBuilder", | ||
| "MambaModelConfig", | ||
| "MambaModelBuilder", | ||
| ] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: This field is typed as
HybridModelConfigwith no default, butpretrain_cfg_container_from_argsdefaultsmodel_cfg=Noneand five existing callers (pretrain_gpt.py,pretrain_bert.py,pretrain_t5.py,pretrain_vlm.py,train_rl.py) call it without amodel_cfg, soNoneis passed here. This will break type checkers and cause a runtimeAttributeErrorif any downstream code accesses attributes oncfg.model.