Commit 93f1831
[reward] feat: add compute_score timing metrics to agent loop (#5971)
### What does this PR do?
Add `compute_score` timing metric to `AgentLoopMetrics` in the agent
loop to track the time spent on reward score computation
(`_compute_score`). This helps identify reward computation bottlenecks
during training.
**Changes:**
1. Added `compute_score: float = 0.0` field to `AgentLoopMetrics`
2. Instrumented `_compute_score()` with `simple_timer` to measure reward
computation time per sample
3. Added `agent_loop/compute_score/min|max|mean` and
`agent_loop/slowest/compute_score` to `_performance_metrics` aggregation
This follows the same pattern as the existing `generate_sequences` and
`tool_calls` timing metrics.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+compute_score+metrics
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
### Test
The change is backward-compatible:
- `AgentLoopMetrics.compute_score` defaults to `0.0`, so existing agent
loops that do not use async reward will report `0.0` without breaking.
- When `reward_loop_worker_handles` is not `None`, `_compute_score`
measures the full reward computation call and writes the elapsed time
into `output.metrics.compute_score`.
- The `_performance_metrics` method safely aggregates `compute_score`
from all samples, consistent with how `generate_sequences` and
`tool_calls` are handled.
### API and Usage Example
No API changes. The new metric is automatically reported in the training
logs alongside existing metrics:
```
agent_loop/compute_score/min: 0.12
agent_loop/compute_score/max: 2.34
agent_loop/compute_score/mean: 0.78
agent_loop/slowest/compute_score: 2.34
```
### Design & Code Changes
- `verl/experimental/agent_loop/agent_loop.py`:
- `AgentLoopMetrics`: added `compute_score` field
- `AgentLoopWorker._compute_score()`: wrapped reward computation with
`simple_timer`
- `AgentLoopManager._performance_metrics()`: added min/max/mean/slowest
aggregation for `compute_score`
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: The timing metric
follows the exact same pattern as existing `generate_sequences` and
`tool_calls` metrics. Testing requires GPU + reward model setup which is
covered by existing integration tests in
`tests/experimental/reward_loop/`.
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>1 parent fba0939 commit 93f1831
1 file changed
+35
-25
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
180 | 181 | | |
181 | 182 | | |
182 | 183 | | |
| 184 | + | |
183 | 185 | | |
184 | 186 | | |
185 | 187 | | |
| |||
857 | 859 | | |
858 | 860 | | |
859 | 861 | | |
860 | | - | |
861 | | - | |
862 | | - | |
863 | | - | |
864 | | - | |
865 | | - | |
866 | | - | |
867 | | - | |
868 | | - | |
869 | | - | |
870 | | - | |
871 | | - | |
872 | | - | |
873 | | - | |
874 | | - | |
875 | | - | |
876 | | - | |
877 | | - | |
878 | | - | |
879 | | - | |
880 | | - | |
881 | | - | |
882 | | - | |
883 | | - | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
884 | 889 | | |
885 | 890 | | |
886 | 891 | | |
| |||
1200 | 1205 | | |
1201 | 1206 | | |
1202 | 1207 | | |
| 1208 | + | |
1203 | 1209 | | |
1204 | 1210 | | |
1205 | 1211 | | |
| |||
1210 | 1216 | | |
1211 | 1217 | | |
1212 | 1218 | | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
1213 | 1222 | | |
1214 | 1223 | | |
1215 | | - | |
| 1224 | + | |
1216 | 1225 | | |
1217 | 1226 | | |
1218 | 1227 | | |
| 1228 | + | |
1219 | 1229 | | |
1220 | 1230 | | |
1221 | 1231 | | |
| |||
0 commit comments