Skip to content

[training migration] Migrate mamba builder#4550

Merged
maanug-nv merged 33 commits into
NVIDIA:mainfrom
maanug-nv:migrate-mamba-builder
May 13, 2026
Merged

[training migration] Migrate mamba builder#4550
maanug-nv merged 33 commits into
NVIDIA:mainfrom
maanug-nv:migrate-mamba-builder

Conversation

@maanug-nv

@maanug-nv maanug-nv commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

What does this PR do ?

This PR migrates the ModelConfig+ModelBuilder system built in Megatron-Bridge (NVIDIA-NeMo/Megatron-Bridge#2241), including the config and builder for Mamba/Hybrid model.
See original PR for more design details.

This should replace the existing hybrid_builders.py method of initializing MCore Hybrid models.

#4656 (which is still WIP) demonstrates how this will abstraction will be used and integrated into the setup/initialization phase of the training code.

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Issue tracking

For PRs from open-source community contributors:

  • New features: a linked issue is required. Please open a feature request and reference it here before submitting the PR.
  • Small updates (bug fixes, minor improvements): a linked issue is recommended and will accelerate the PR review process.

Linked issue:

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

  1. When your PR is ready, click Ready for Review.
  2. An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
    • Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Apr 30, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@maanug-nv

maanug-nv commented Apr 30, 2026

Copy link
Copy Markdown
Contributor Author

Remaining tasks for this PR before ready for review:

  • Rename Mamba->Hybrid
  • Ensure no gaps between hybrid_builders.hybrid_builder() and MambaBuilder.build_model()
  • Ensure no gaps between unimodal_build_distributed_models() and get_model()
  • Integrate MambaConfig+MambaBuilder into pretrain_hybrid.py and replace hybrid_builder()

Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
@maanug-nv

maanug-nv commented Apr 30, 2026

Copy link
Copy Markdown
Contributor Author

Will follow-up this PR with migration of setup() and initialize_megatron() to actually call the builder's build function like so:

builder_cls = model_config.get_builder_cls()
builder = builder_cls(model_config)
return builder.build_distributed_models(
    pg_collection=pg_collection,
    ddp_config=cfg.ddp,
    overlap_param_gather_with_optimizer_step=cfg.optimizer.overlap_param_gather_with_optimizer_step,
    use_megatron_fsdp=cfg.dist.use_megatron_fsdp,
    use_torch_fsdp2=cfg.dist.use_torch_fsdp2,
    data_parallel_random_init=cfg.rng.data_parallel_random_init,
)

Signed-off-by: Maanu Grover <maanug@nvidia.com>
@maanug-nv maanug-nv changed the title Migrate mamba builder [training migration] Migrate mamba builder May 5, 2026
@maanug-nv maanug-nv marked this pull request as ready for review May 6, 2026 21:22
@maanug-nv maanug-nv requested review from a team as code owners May 6, 2026 21:22
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team May 6, 2026 21:22
@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Approved All necessary approvals have been made label May 12, 2026
@maanug-nv maanug-nv enabled auto-merge May 13, 2026 01:09
@maanug-nv maanug-nv added this pull request to the merge queue May 13, 2026
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25771852280

@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25776760893

@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25782611173

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 13, 2026
Signed-off-by: Maanu Grover <maanug@nvidia.com>
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25829772062

@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25831006101

Merged via the queue into NVIDIA:main with commit 0dc36df May 13, 2026
74 of 75 checks passed
@maanug-nv maanug-nv deleted the migrate-mamba-builder branch May 13, 2026 23:48
cspades pushed a commit to cspades/Megatron-LM that referenced this pull request May 14, 2026
Signed-off-by: Maanu Grover <maanug@nvidia.com>
janEbert pushed a commit to janEbert/Megatron-LM that referenced this pull request Jun 2, 2026
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved All necessary approvals have been made complexity: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants