[training migration] Migrate mamba builder by maanug-nv · Pull Request #4550 · NVIDIA/Megatron-LM

maanug-nv · 2026-04-30T08:05:31Z

What does this PR do ?

This PR migrates the ModelConfig+ModelBuilder system built in Megatron-Bridge (NVIDIA-NeMo/Megatron-Bridge#2241), including the config and builder for Mamba/Hybrid model.
See original PR for more design details.

This should replace the existing hybrid_builders.py method of initializing MCore Hybrid models.

#4656 (which is still WIP) demonstrates how this will abstraction will be used and integrated into the setup/initialization phase of the training code.

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Issue tracking

For PRs from open-source community contributors:

New features: a linked issue is required. Please open a feature request and reference it here before submitting the PR.
Small updates (bug fixes, minor improvements): a linked issue is recommended and will accelerate the PR review process.

Linked issue:

Contribution process

Pre-checks

I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

When your PR is ready, click Ready for Review.
An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
- Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Signed-off-by: Maanu Grover <maanug@nvidia.com>

copy-pr-bot · 2026-04-30T08:05:36Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

maanug-nv · 2026-04-30T08:08:45Z

Remaining tasks for this PR before ready for review:

Rename Mamba->Hybrid
Ensure no gaps between hybrid_builders.hybrid_builder() and MambaBuilder.build_model()
Ensure no gaps between unimodal_build_distributed_models() and get_model()
Integrate MambaConfig+MambaBuilder into pretrain_hybrid.py and replace hybrid_builder()

Signed-off-by: Maanu Grover <maanug@nvidia.com>

maanug-nv · 2026-04-30T23:16:33Z

Will follow-up this PR with migration of setup() and initialize_megatron() to actually call the builder's build function like so:

builder_cls = model_config.get_builder_cls()
builder = builder_cls(model_config)
return builder.build_distributed_models(
    pg_collection=pg_collection,
    ddp_config=cfg.ddp,
    overlap_param_gather_with_optimizer_step=cfg.optimizer.overlap_param_gather_with_optimizer_step,
    use_megatron_fsdp=cfg.dist.use_megatron_fsdp,
    use_torch_fsdp2=cfg.dist.use_torch_fsdp2,
    data_parallel_random_init=cfg.rng.data_parallel_random_init,
)

Signed-off-by: Maanu Grover <maanug@nvidia.com>

svcnvidia-nemo-ci · 2026-05-13T01:10:06Z