Skip to content

Turn-based compaction, markovian training support, BabyAI eval#1

Open
Emilianopp wants to merge 7 commits intomainfrom
feature/balrog-babyai
Open

Turn-based compaction, markovian training support, BabyAI eval#1
Emilianopp wants to merge 7 commits intomainfrom
feature/balrog-babyai

Conversation

@Emilianopp
Copy link
Copy Markdown
Collaborator

Summary

  • Turn-based compaction (n_protect_turns): fires after each complete assistant response rather than reactively mid-generation. Uses the most recent assistant response's K-vectors as importance queries (lookahead grounded in what the model actually attended to). Markovian mode is supported. KV-budget compaction is disabled when turn-based is active so the two modes don't interfere.
  • Markovian training pipeline: CompactionEnv upgraded to MultiTurnEnv, compaction_mode field added to trainer config with auto-sync from env args, segmented_forward gains a markovian hard-delete branch, orchestrator skips env_client injection for CompactionEnv.
  • BabyAI multi-turn eval (eval_balrog_babyai.py): evaluates baseline, compaction, and markovian modes on MiniGrid tasks with persistent KV sessions. Also adds eval_aime_rsa.py for AIME benchmarking.

Breaking changes

  • configs/compaction/qwen3_4b_serve_tp1.toml: model updated from Qwen/Qwen3-4BQwen/Qwen3-4B-Instruct-2507, max_model_len = 32768 added.
  • eval_balrog_babyai.py default --model updated to match.

Everything else is backwards compatible — all new parameters default to preserving existing behaviour (n_protect_turns=-1, compaction_mode="attention_matching").

Bug fixes

  • Guard against compact_end <= prompt_len in turn-based compaction (empty region crashes topk)
  • Off-by-one in asst_start: EOS is counted in turn_asst_lens but not yet written to KV
  • Warning logged when compact_window is set alongside n_protect_turns (ignored in turn-based mode)
  • Session ID now includes mode prefix to prevent collisions when running multiple eval scripts concurrently

Test plan

  • Smoke test turn-based compaction: eval_balrog_babyai.py --mode compaction --use-sessions --n-protect-turns 1 --n 1
  • Smoke test markovian turn-based: eval_balrog_babyai.py --mode markovian --use-sessions --n-protect-turns 1 --n 1
  • Verify n_protect_turns=-1 (default) still uses KV-budget path unchanged
  • Verify compact_window warning fires when set with n_protect_turns >= 0

🤖 Generated with Claude Code

Emilianopp and others added 7 commits March 20, 2026 15:28
CompactionEnv is upgraded from SingleTurnEnv to MultiTurnEnv to support
multi-turn environments (e.g. BabyAI). A new compaction_mode field is
added to TrainerConfig and auto-synced from env args via a model validator,
so the trainer mirrors whatever mode the inference server uses. The
segmented_forward in compaction.py gains a markovian branch that hard-deletes
the compaction window (empty C1/C2) instead of running Attention Matching.
The orchestrator skips env_client injection for CompactionEnv, which runs
in-process to call /compact_generate on the local inference server.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces n_protect_turns to the session API as an alternative compaction
trigger: instead of firing reactively when the KV fills, compaction fires
after each complete assistant response, leaving the last n_protect_turns
turns untouched.

Key design choices:
- query_vecs param added to compact_kv_range/compact_kv: when provided,
  uses real key vectors as importance probes instead of random Gaussian
  queries. Turn-based compaction passes the most recent assistant response's
  K-vectors as a lookahead signal grounded in what the model attended to.
- KV-budget compaction (pre-decode + in-decode) is disabled when
  n_protect_turns >= 0, so the two modes do not interfere.
- Markovian mode is supported: when compaction_mode="markovian", the
  turn-based path hard-deletes rather than compresses.

Bug fixes included:
- Guard against compact_end <= prompt_len (empty region crashes topk)
- Off-by-one in asst_start: EOS is counted in turn_asst_lens but not
  yet written to KV, so asst_len_in_kv = turn_asst_lens[-1] - 1
- Warning logged when compact_window is set alongside n_protect_turns,
  since compact_window is ignored in turn-based mode

Default n_protect_turns=-1 preserves existing KV-budget behaviour exactly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
eval_balrog_babyai.py evaluates models on BabyAI (MiniGrid) grid-world
tasks in three modes: baseline (standard vLLM), compaction (KV budget or
turn-based), and markovian. Supports persistent KV sessions (--use-sessions)
to avoid re-prefilling full history each turn, and --n-protect-turns for
turn-based compaction.

eval_aime_rsa.py benchmarks RSA vs baseline on AIME problems.

New configs cover markovian training (qwen3_4b_markovian_*), a 0.6B fast
test variant (qwen3_06b_*), and BabyAI baseline training. The serve config
is updated to Qwen3-4B-Instruct-2507 with max_model_len=32768.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…files

The broad *.toml / *.sbatch / *.sh rules were blocking new source configs
and job scripts from being tracked. Added negation rules for configs/**/*.toml
and scripts/*.{sbatch,sh} so these are committable while local outputs remain
ignored.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
README.md gains a Turn-Based Compaction section documenting n_protect_turns,
the KV layout, comparison table against KV-budget and markovian modes, and
API/CLI usage examples. The Markovian section and eval result tables are
updated. IMPLEMENTATION.md and CLAUDE.md reflect the new key files, configs,
and launch commands.

eval_rg_mix.py adds RSA mode support and misc eval improvements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…igrid dep

- Add BabyAI Evaluation section: tasks, all CLI modes (baseline, KV-budget,
  turn-based with --n-protect-turns, markovian), key flags table, setup notes
- Add Session API section documenting /compact_session/create, /step, and DELETE
  with full parameters tables including n_protect_turns
- Update Scripts and Configs tables with eval_balrog_babyai.py, viz scripts,
  qwen3_4b_balrog_babyai.toml, and qwen3_4b_baseline_tp1.toml
- Add minigrid>=2.3.0 to project dependencies (needed by eval_balrog_babyai.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Emilianopp pushed a commit that referenced this pull request Apr 8, 2026
The #1 performance bottleneck was that each compact_session_step call
processed a single sequence per collective_rpc. With 128 rollouts × 60
turns = ~7680 sequential RPCs across only 4 DP engines. The GPU was
running batch=1 forward passes when it could handle batch=16+.

Changes:
- worker.py: Add compact_session_step_batch() that packs B sessions
  into one batched append-prefill + one batched decode loop. Post-decode
  compaction is handled per-session (cheap vs forward passes).
- routes.py: Add _SessionStepBatcher that transparently accumulates
  concurrent /compact_session/step requests within 50ms windows, groups
  by DP rank, and dispatches one compact_session_step_batch per rank.

No env.py changes needed — the batcher is transparent to HTTP callers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant