[perf] fix: modify the NPU profiler default configuration by tardis-key · Pull Request #4475 · verl-project/verl

tardis-key · 2025-12-10T07:39:05Z

What does this PR do?

Profiling in reinforcement learning generates a large volume of data, which impairs its ease of use. Based on optimization experience @mengchengTang , the default recommended parameters have been modified.
refer to https://www.hiascend.com/document/detail/zh/Pytorch/720/apiref/torchnpuCustomsapi/context/torch_npu-profiler-_ExperimentalConfig.md for detailed interface specifications.

The test results of tests/special_npu/run_qwen2_5_05b_grpo.sh are as follows:

Before modification + analysis=True: 12.8GB
Before modification + analysis=False: 3.25GB
After modification + analysis=True: 3.48GB
After modification + analysis=False: 1.92GB

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

gemini-code-assist

Code Review

This pull request aims to reduce the data volume from the NPU profiler by changing the default profiling level from level1 to level0. The changes are consistently applied across various configuration files, examples, and the default configuration object. Additionally, the profiler configuration is updated to exclude communication data and switch to a database export format, which should further improve performance. I have one critical suggestion to improve the robustness of a runtime dependency check.

verl/utils/profiler/mstx_profile.py

CLAassistant · 2025-12-16T07:57:22Z

All committers have signed the CLA.

Co-authored-by: Shangwei-Li <lishangwei@mail.ustc.edu.cn>

1. Check torch_npu version instead of sig.parameters for better readability and troubleshooting 2. Delete aic_metrics since it's not necessary for level0 3. Recommend 'module' instead of 'stack'

@mengchengTang

…ct#4475) ### What does this PR do? Profiling in reinforcement learning generates a large volume of data, which impairs its ease of use. Based on optimization experience @mengchengTang , the default recommended parameters have been modified. refer to https://www.hiascend.com/document/detail/zh/Pytorch/720/apiref/torchnpuCustomsapi/context/torch_npu-profiler-_ExperimentalConfig.md for detailed interface specifications. The test results of tests/special_npu/run_qwen2_5_05b_grpo.sh are as follows: - Before modification + analysis=True: 12.8GB - Before modification + analysis=False: 3.25GB - After modification + analysis=True: 3.48GB - After modification + analysis=False: 1.92GB ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Shangwei-Li <lishangwei@mail.ustc.edu.cn>

@mengchengTang

…ct#4475) ### What does this PR do? Profiling in reinforcement learning generates a large volume of data, which impairs its ease of use. Based on optimization experience @mengchengTang , the default recommended parameters have been modified. refer to https://www.hiascend.com/document/detail/zh/Pytorch/720/apiref/torchnpuCustomsapi/context/torch_npu-profiler-_ExperimentalConfig.md for detailed interface specifications. The test results of tests/special_npu/run_qwen2_5_05b_grpo.sh are as follows: - Before modification + analysis=True: 12.8GB - Before modification + analysis=False: 3.25GB - After modification + analysis=True: 3.48GB - After modification + analysis=False: 1.92GB ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Shangwei-Li <lishangwei@mail.ustc.edu.cn>

tardis-key requested review from FightingZhen, PeterSH6, eric-haibin-lin, ji-huazhong, tongyx361 and vermouth1992 as code owners December 10, 2025 07:39

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

verl/utils/profiler/mstx_profile.py Outdated Show resolved Hide resolved

tardis-key changed the title ~~[perf] fix: modify the NPU profiler configuration to reduce data volume.~~ [perf] fix: modify the NPU profiler default configuration Dec 10, 2025

tardis-key force-pushed the profiler_simplification branch from 248e80b to a79a55e Compare December 10, 2025 07:56

tardis-key changed the title ~~[perf] fix: modify the NPU profiler default configuration~~ [WIP][perf] fix: modify the NPU profiler default configuration Dec 10, 2025

tardis-key marked this pull request as draft December 10, 2025 07:58

tardis-key marked this pull request as ready for review December 10, 2025 08:07

tardis-key changed the title ~~[WIP][perf] fix: modify the NPU profiler default configuration~~ [perf] fix: modify the NPU profiler default configuration Dec 10, 2025

tardis-key mentioned this pull request Dec 10, 2025

[RFC] Profiling system in async mode #4207

Closed

8 tasks

FightingZhen reviewed Dec 12, 2025

View reviewed changes

verl/utils/profiler/mstx_profile.py Outdated Show resolved Hide resolved

mengchengTang reviewed Dec 12, 2025

View reviewed changes

verl/utils/profiler/mstx_profile.py Outdated Show resolved Hide resolved

verl/utils/profiler/mstx_profile.py Outdated Show resolved Hide resolved

tardis-key force-pushed the profiler_simplification branch from 8204110 to dafa529 Compare December 12, 2025 07:07

tardis-key requested review from FightingZhen and mengchengTang December 12, 2025 07:26

mengchengTang approved these changes Dec 12, 2025

View reviewed changes

FightingZhen reviewed Dec 13, 2025

View reviewed changes

verl/utils/profiler/mstx_profile.py Outdated Show resolved Hide resolved

verl/utils/profiler/mstx_profile.py Outdated Show resolved Hide resolved

tardis-key force-pushed the profiler_simplification branch 2 times, most recently from 9cfefff to 6a659fe Compare December 15, 2025 09:04

tardis-key requested review from ISEEKYAN, ZihengJiang, chenhaiq, wuxibin89 and zhaochenyang20 as code owners December 16, 2025 07:57

tardis-key force-pushed the profiler_simplification branch from 6996701 to 9b101b8 Compare December 16, 2025 08:13

tardis-key and others added 3 commits December 18, 2025 20:09

modify the profiler's default configuration to reduce data volume.

841c452

Co-authored-by: Shangwei-Li <lishangwei@mail.ustc.edu.cn>

change default level to level0, update doc

8d3707e

revision based on review comments

c7b898a

1. Check torch_npu version instead of sig.parameters for better readability and troubleshooting 2. Delete aic_metrics since it's not necessary for level0 3. Recommend 'module' instead of 'stack'

tardis-key force-pushed the profiler_simplification branch from 9b101b8 to c7b898a Compare December 18, 2025 12:10

tardis-key requested a review from FightingZhen December 19, 2025 02:43

FightingZhen approved these changes Dec 19, 2025

View reviewed changes

FightingZhen merged commit 71a6eb6 into verl-project:main Dec 19, 2025
70 checks passed

tardis-key deleted the profiler_simplification branch December 31, 2025 02:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[perf] fix: modify the NPU profiler default configuration#4475

[perf] fix: modify the NPU profiler default configuration#4475
FightingZhen merged 3 commits intoverl-project:mainfrom
tardis-key:profiler_simplification

tardis-key commented Dec 10, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tardis-key commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tardis-key commented Dec 10, 2025 •

edited

Loading

CLAassistant commented Dec 16, 2025 •

edited

Loading