Commit fba0939
[trainer] fix: return NaN for empty tensors in compute_data_metrics (#5899)
### What does this PR do?
Fixes #5894
When all samples are aborted or `response_mask` is all `False`,
`compute_data_metrics` crashes with:
```
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
```
This PR replaces the crash with a warning + NaN return, so training can
continue gracefully in edge cases such as:
- **System debugging**: early training runs may produce invalid
responses; crashing blocks iteration
- **Unique sampling strategies**: some strategies may occasionally
produce batches where all responses are invalid
Note: `compute_data_metrics` is purely a logging/monitoring function —
its return values do not feed into loss computation. When all responses
are invalid, the actual loss gradient is 0 (via `masked_sum /
(num_tokens + 1e-8) = 0`), meaning no parameter updates occur, which is
the expected behavior.
### Checklist Before Starting
- [x] Search for similar PRs:
-
https://github.com/volcengine/verl/pulls?q=is%3Apr+compute_data_metrics
- Found related PR #5860 which fixes a similar issue in
`calculate_debug_metrics`
### Test
This is a defensive bug fix for an edge case.
### API and Usage Example
No API changes. The function signature remains the same:
```python
from verl.trainer.ppo.metric_utils import compute_data_metrics
# Normal case - works as before
metrics = compute_data_metrics(batch)
# Returns: {'critic/score/mean': 0.5, 'critic/rewards/max': 1.0, ...}
# Edge case - now handled gracefully instead of crashing
metrics = compute_data_metrics(batch_with_all_aborted)
# Logs warning, returns: {'critic/score/mean': nan, 'critic/rewards/max': nan, ...}
# Training continues normally (loss gradient = 0, no parameter updates)
```
### Design & Code Changes
**File changed:** `verl/trainer/ppo/metric_utils.py`
**Root Cause:**
`torch.max()` / `torch.min()` operations on empty tensors raise
RuntimeError when:
- All samples are aborted (empty `non_aborted_sequence_score/reward`)
- `response_mask` is all False (empty
`valid_adv/valid_returns/valid_values`)
**Solution:** Add `numel() > 0` checks with NaN fallback (same pattern
as PR #5860):
```python
if non_aborted_sequence_score.numel() > 0:
score_mean = torch.mean(non_aborted_sequence_score).detach().item()
score_max = torch.max(non_aborted_sequence_score).detach().item()
score_min = torch.min(non_aborted_sequence_score).detach().item()
else:
logger.warning("All samples are aborted, returning default score metrics")
score_mean = score_max = score_min = float("nan")
```
Applied to:
- `non_aborted_sequence_score/reward` → score/reward metrics
- `valid_adv/valid_returns` → advantage/return metrics
- `valid_values` → critic/values metrics (when use_critic=True)
- `non_aborted_response_length` → response length metrics
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
- [x] Apply pre-commit checks (ruff check & format passed locally)
- [x] Add / Update documentation (N/A - internal bug fix)
- [x] Add unit or end-to-end test(s) (N/A - edge case, no tests needed)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Boundless <ruihang_wu@163.com>1 parent 43d1c6f commit fba0939
1 file changed
+70
-30
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| 30 | + | |
| 31 | + | |
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
| |||
128 | 131 | | |
129 | 132 | | |
130 | 133 | | |
131 | | - | |
132 | | - | |
133 | | - | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
134 | 141 | | |
135 | | - | |
136 | | - | |
137 | | - | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
138 | 149 | | |
139 | 150 | | |
140 | 151 | | |
141 | 152 | | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
147 | 168 | | |
148 | 169 | | |
149 | 170 | | |
| |||
158 | 179 | | |
159 | 180 | | |
160 | 181 | | |
161 | | - | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
162 | 213 | | |
163 | 214 | | |
164 | 215 | | |
| |||
170 | 221 | | |
171 | 222 | | |
172 | 223 | | |
173 | | - | |
174 | | - | |
175 | | - | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
176 | 227 | | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
192 | 232 | | |
193 | 233 | | |
194 | 234 | | |
| |||
0 commit comments