[perf] feat: simplify precision_debugger config behavior and docs#5986
[perf] feat: simplify precision_debugger config behavior and docs#5986tardis-key merged 2 commits intoverl-project:mainfrom
Conversation
|
TAJh seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Code Review
This pull request simplifies the configuration of the msprobe Precision Debugger by centralizing step filtering and output path management. Redundant fields such as data_dir and tool-specific steps have been deprecated or removed in favor of global_profiler.save_path and global_profiler.steps. Additionally, rank filtering for msprobe is now delegated to its internal config.json, ensuring the verl-side rank gate remains open when the tool is enabled. I have no feedback to provide.
|
@tardis-key I have simplified the usage, please review. |
|
Thank you for your quick response! |
- Use global_profiler.steps as the single step gate for precision_debugger\n- Default dump root to global_profiler.save_path (data_dir overrides when set)\n- Mark precision_debugger.steps as deprecated/ignored in config and runtime\n- Update precision_debugger docs with common config.json samples and minimal CLI usage\n\nCo-authored-by: OpenAI Codex <codex@openai.com>
- Update _generated_* trainer config snapshots to match current source configs\n- Include precision_debugger tool_config propagation in generated files\n- Remove stale precision_debugger enable/data_dir generated fields\n\nCo-authored-by: OpenAI Codex <codex@openai.com>
…rl-project#5986) ## Summary This PR aligns and simplifies PrecisionDebugger integration and documentation. ### Changes - Align PrecisionDebugger profiling behavior with global profiler controls. - Simplify precision_debugger config behavior and usage guidance. - Improve PrecisionDebugger docs with practical msprobe `config.json` examples (`statistics` and `tensor`) and simple CLI enablement examples. ## Why this is not duplicate work - Checked existing open PRs for this head/base and did not find an existing open PR from `Tjh-UKN:main` to `verl-project/verl:main`. ## Tests run - `python -m py_compile verl/utils/profiler/config.py verl/utils/profiler/profile.py verl/utils/profiler/precision_debugger_profile.py` - Result: pass fix verl-project#5985 ## Test Result tree /data01/tjh/verl/outputs/precision_debug_SIMP/step_1/ /data01/tjh/verl/outputs/precision_debug_SIMP/step_1/ ├── actor_compute_log_prob │ └── step0 │ ├── rank0 │ │ └── dump.json │ └── rank1 │ └── dump.json ├── actor_update │ └── step0 │ ├── rank0 │ │ └── dump.json │ └── rank1 │ └── dump.json └── ref_compute_log_prob └── step0 ├── rank0 │ └── dump.json └── rank1 └── dump.json 12 directories, 6 files --------- Co-authored-by: TAJh <taojiaheng1@huawei.com>
Summary
This PR aligns and simplifies PrecisionDebugger integration and documentation.
Changes
config.jsonexamples (statisticsandtensor) and simple CLI enablement examples.Why this is not duplicate work
Tjh-UKN:maintoverl-project/verl:main.Tests run
python -m py_compile verl/utils/profiler/config.py verl/utils/profiler/profile.py verl/utils/profiler/precision_debugger_profile.pyfix #5985
Test Result
tree /data01/tjh/verl/outputs/precision_debug_SIMP/step_1/
/data01/tjh/verl/outputs/precision_debug_SIMP/step_1/
├── actor_compute_log_prob
│ └── step0
│ ├── rank0
│ │ └── dump.json
│ └── rank1
│ └── dump.json
├── actor_update
│ └── step0
│ ├── rank0
│ │ └── dump.json
│ └── rank1
│ └── dump.json
└── ref_compute_log_prob
└── step0
├── rank0
│ └── dump.json
└── rank1
└── dump.json
12 directories, 6 files