Skip to content

fix: harden AutoRL-Bench RL evaluators#1402

Merged
couragec merged 3 commits intomainfrom
fix/autorl-bench-rl-bugfixes
Apr 28, 2026
Merged

fix: harden AutoRL-Bench RL evaluators#1402
couragec merged 3 commits intomainfrom
fix/autorl-bench-rl-bugfixes

Conversation

@couragec
Copy link
Copy Markdown
Collaborator

@couragec couragec commented Apr 28, 2026

Summary

This PR hardens the AutoRL-Bench RL evaluators by backporting a small set
of targeted fixes:

  • Fix ALFWorld vLLM evaluation crashes from overlong ReAct history by
    truncating prompts to fit the configured context window.
  • Ensure ALFWorld restores stdout and cleans up the environment / vLLM
    backend even when evaluation fails.
  • Propagate vLLM-safe environment variables to OpenCompass subprocesses
    and set enforce_eager=True in the OpenCompass template.
  • Prevent failed baseline evaluations from silently becoming cached or
    returned as valid 0.0 scores.
  • Improve timeout cleanup by killing the process session/tree, while
    keeping kill_process_group as a compatibility alias.

Validation

  • python -m py_compile rdagent/scenarios/rl/autorl_bench/benchmarks/ alfworld/eval.py rdagent/scenarios/rl/autorl_bench/core/evaluator.py rdagent/scenarios/rl/autorl_bench/core/utils.py rdagent/scenarios/rl/ autorl_bench/core/opencompass.py rdagent/scenarios/rl/autorl_bench/core/ __init__.py rdagent/scenarios/rl/autorl_bench/test/test_fixes.py
  • ruff check --no-fix --select F,E9 rdagent/scenarios/rl/autorl_bench/ benchmarks/alfworld/eval.py rdagent/scenarios/rl/autorl_bench/core/ evaluator.py rdagent/scenarios/rl/autorl_bench/core/utils.py rdagent/ scenarios/rl/autorl_bench/core/opencompass.py rdagent/scenarios/rl/ autorl_bench/core/__init__.py rdagent/scenarios/rl/autorl_bench/test/ test_fixes.py
  • /tmp/rdagent-test-venv/bin/python -m rdagent.scenarios.rl.autorl_bench.test.test_fixes

Result: 24 passed, 0 failed.


📚 Documentation preview 📚: https://RDAgent--1402.org.readthedocs.build/en/1402/

@couragec couragec merged commit 0ca1609 into main Apr 28, 2026
9 checks passed
@couragec couragec deleted the fix/autorl-bench-rl-bugfixes branch April 28, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant