Problem
Single Anthropic account sustains many concurrent sonnet workers but only ~3-4 concurrent opus before triggering 429 rate limits. Currently no per-model concurrency cap exists in dispatch logic. When fill_floor + cascade-elevation both push opus-tier dispatches, all simultaneous opus workers hit 429 within seconds and become 20-min zombies (see GH#21570 follow-up).
Evidence: 3 opus-4-6 workers killed at the same minute with rate_limit (headless-runtime-metrics.jsonl ts=1777397345-1777397359). Sonnet workers in the same pool succeeded.
How
EDIT: .agents/scripts/pulse-dispatch-engine.sh — function _dff_compute_max_parallel (~line 712-727). Add a per-model concurrency check:
# Count in-flight workers by model from active worker process list
local opus_inflight
opus_inflight=$(pgrep -f 'opencode.*-m anthropic/claude-opus' | wc -l | tr -d ' ')
local opus_cap=${AIDEVOPS_OPUS_CONCURRENCY_CAP:-4}
if (( opus_inflight >= opus_cap )); then
# Skip opus-tier candidates this cycle; sonnet candidates still dispatch
return $opus_cap
fi
EDIT: .agents/scripts/pulse-dispatch-core.sh near line 1119 — when a candidate's resolved model is opus AND opus_inflight >= cap, defer with reason="opus_concurrency_cap" instead of dispatching.
NEW: per-model cap config in .agents/configs/dispatch-model-caps.conf:
opus-4-6=4
opus-4-7=4
sonnet-4-6=24
Reference pattern
shared-gh-wrappers.sh rate-limit threshold logic is conceptually similar: probe budget before action, defer when below threshold.
Verification
# Spawn 6 simulated opus workers (sleep loops with matching cmdline)
for i in 1 2 3 4 5 6; do (exec -a 'opencode -m anthropic/claude-opus-4-6' sleep 600 &); done
# Run pulse cycle
pulse-wrapper.sh --canary
# Log should show 'opus_concurrency_cap' deferrals for issues that would have used opus
Acceptance
- Concurrent opus workers never exceed the cap (default 4)
- Sonnet dispatch unaffected
- Cap is overridable via
AIDEVOPS_OPUS_CONCURRENCY_CAP env
- Deferred candidates retry next cycle (not NMR'd)
Filed from interactive root-cause session (2026-04-28). Composes with the early-429-detection fix.
aidevops.sh v3.13.7 plugin for OpenCode v1.14.29 with claude-opus-4-7 spent 1h 12m and 19,339 tokens on this with the user in an interactive session.
Problem
Single Anthropic account sustains many concurrent sonnet workers but only ~3-4 concurrent opus before triggering 429 rate limits. Currently no per-model concurrency cap exists in dispatch logic. When fill_floor + cascade-elevation both push opus-tier dispatches, all simultaneous opus workers hit 429 within seconds and become 20-min zombies (see GH#21570 follow-up).
Evidence: 3 opus-4-6 workers killed at the same minute with rate_limit (
headless-runtime-metrics.jsonlts=1777397345-1777397359). Sonnet workers in the same pool succeeded.How
EDIT:
.agents/scripts/pulse-dispatch-engine.sh— function_dff_compute_max_parallel(~line 712-727). Add a per-model concurrency check:EDIT:
.agents/scripts/pulse-dispatch-core.shnear line 1119 — when a candidate's resolved model is opus AND opus_inflight >= cap, defer with reason="opus_concurrency_cap" instead of dispatching.NEW: per-model cap config in
.agents/configs/dispatch-model-caps.conf:Reference pattern
shared-gh-wrappers.shrate-limit threshold logic is conceptually similar: probe budget before action, defer when below threshold.Verification
Acceptance
AIDEVOPS_OPUS_CONCURRENCY_CAPenvFiled from interactive root-cause session (2026-04-28). Composes with the early-429-detection fix.
aidevops.sh v3.13.7 plugin for OpenCode v1.14.29 with claude-opus-4-7 spent 1h 12m and 19,339 tokens on this with the user in an interactive session.