-
Notifications
You must be signed in to change notification settings - Fork 265
Open
Description
To support large-scale multi-node MoE training, we will integrate the Megatron training backend.
Tasks:
- Initial Megatron Support for GRPO on TP + PP for dense models: [trainer] Initial Megatron TP + PP Support #223
- Sequence packing + context parallel support: [trainer][megatron] Sequence packing + Context Parallel for Megatron #274
- Checkpointing support [Megatron] Add checkpointing support #298
- EP support for MoE models [trainer][megatron] Enable expert model and expert tensor parallel for MoE models #285
- migration to Megatron-Bridge [megatron] upgrade from mbridge -> Megatron-Bridge (breaking change) #453
- LoRA support via Megatron-Bridge
- unify gradient checkpointing flag
- Add guide for debugging and getting around OOM
- enable megatron fsdp
- Disaggregated Training
- PPO support (just need to add Critic and test)
- Add megatron optimizer/scheduler pass through config options
- Virtual Pipeline Parallel Support
- Dynamic Batch sizing
- Optimizing for efficiency (optimized kernels/bucketing weight updates) [megatron] Added non cuda ipc wt sync to megatron workers #635
- Support FlashRL + Megatron integration
- Documentation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels