ci: restore perf test torchrun logs#4951
Merged
Merged
Conversation
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
Author
|
Fast merging to resolve internal testing issue. This script is only used on internal tests. |
santhnm2
pushed a commit
to santhnm2/Megatron-LM
that referenced
this pull request
May 26, 2026
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
shanmugamr1992
added a commit
to shanmugamr1992/Megatron-LM
that referenced
this pull request
May 26, 2026
These three torchrun args were added by NVIDIA#4951 on main but lost when perf-fix branched off perf-tests (which predates NVIDIA#4951). The merge of main into perf-fix did not pick them up cleanly. Restoring so the file matches main exactly — the PR no longer touches run_perf_test.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Victarry
added a commit
to yanring/Megatron-LM
that referenced
this pull request
May 27, 2026
* origin/main: (50 commits) Drain predecessor reduce-scatter at dispatch time (NVIDIA#4940) ci: Add allow_failure flag to gpt and moe recipes that are failing in nightlies (NVIDIA#4905) fix(tests): initialize num_microbatches calculator in vision cudagraph tests (NVIDIA#4986) test: re-enable test_pp2_create_cudagraphs_first_stage on TE 2.15+ (NVIDIA#4985) ci: Add support for MBridge job gating based on PR labels (NVIDIA#4926) test(ci): re-enable 8experts2parallel_multi_dist_optimizer_instances_1node (NVIDIA#4984) test: re-enable paged stashing MoE tests (NVIDIA#4978) Fix elastification unwrap_model import (NVIDIA#4972) Avoid offsetting functional test master port (NVIDIA#4973) test: enable NVTE_CUTEDSL_FUSED_GROUPED_MLP via pytest fixture (NVIDIA#4931) chore(beep boop 🤖): Bump (main) (2026-05-25) test(release): add release goldens for deepseekv3/nemotron3 and set tp2pp2 exit-interval (NVIDIA#4932) Fix `get_batch` return order to ignore BlendedDataset provenance fields (NVIDIA#4952) ci: restore perf test torchrun logs (NVIDIA#4951) Various training utils (NVIDIA#4872) ci: Update training script paths in BERT and T5 (NVIDIA#4939) [MXFP8/FP4-param-gather] Post processing after forced param AG in eval (NVIDIA#4562) Fix mxfp8 param gather numerical issue when DP overlap is off (NVIDIA#4800) Add TEFusedDenseMLP for Dense+Grouped GEMM fusion on SM100+ (NVIDIA#4318) (NVIDIA#4786) Fix paged stashing test submodules lookup (NVIDIA#4925) ... # Conflicts: # megatron/training/training.py
janEbert
pushed a commit
to janEbert/Megatron-LM
that referenced
this pull request
Jun 2, 2026
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
{assets_dir}/logs/1beside{assets_dir}/perf_resultssolaunch_jet_workload.pycan findstd*.logassets.gpt_16b_perfretry loop introduced when PR Perf tests #4917 removed the torchrun log arguments.Test Plan
bash -n tests/performance_tests/shell_test_utils/run_perf_test.shgit diff --check -- tests/performance_tests/shell_test_utils/run_perf_test.sh