Turn-based compaction, markovian training support, BabyAI eval by Emilianopp · Pull Request #1 · HyperPotatoNeo/attention-matching-rl

Emilianopp · 2026-03-20T19:38:52Z

Summary

Turn-based compaction (n_protect_turns): fires after each complete assistant response rather than reactively mid-generation. Uses the most recent assistant response's K-vectors as importance queries (lookahead grounded in what the model actually attended to). Markovian mode is supported. KV-budget compaction is disabled when turn-based is active so the two modes don't interfere.
Markovian training pipeline: CompactionEnv upgraded to MultiTurnEnv, compaction_mode field added to trainer config with auto-sync from env args, segmented_forward gains a markovian hard-delete branch, orchestrator skips env_client injection for CompactionEnv.
BabyAI multi-turn eval (eval_balrog_babyai.py): evaluates baseline, compaction, and markovian modes on MiniGrid tasks with persistent KV sessions. Also adds eval_aime_rsa.py for AIME benchmarking.

Breaking changes

configs/compaction/qwen3_4b_serve_tp1.toml: model updated from Qwen/Qwen3-4B → Qwen/Qwen3-4B-Instruct-2507, max_model_len = 32768 added.
eval_balrog_babyai.py default --model updated to match.

Everything else is backwards compatible — all new parameters default to preserving existing behaviour (n_protect_turns=-1, compaction_mode="attention_matching").

Bug fixes

Guard against compact_end <= prompt_len in turn-based compaction (empty region crashes topk)
Off-by-one in asst_start: EOS is counted in turn_asst_lens but not yet written to KV
Warning logged when compact_window is set alongside n_protect_turns (ignored in turn-based mode)
Session ID now includes mode prefix to prevent collisions when running multiple eval scripts concurrently

Test plan

Smoke test turn-based compaction: eval_balrog_babyai.py --mode compaction --use-sessions --n-protect-turns 1 --n 1
Smoke test markovian turn-based: eval_balrog_babyai.py --mode markovian --use-sessions --n-protect-turns 1 --n 1
Verify n_protect_turns=-1 (default) still uses KV-budget path unchanged
Verify compact_window warning fires when set with n_protect_turns >= 0

🤖 Generated with Claude Code

CompactionEnv is upgraded from SingleTurnEnv to MultiTurnEnv to support multi-turn environments (e.g. BabyAI). A new compaction_mode field is added to TrainerConfig and auto-synced from env args via a model validator, so the trainer mirrors whatever mode the inference server uses. The segmented_forward in compaction.py gains a markovian branch that hard-deletes the compaction window (empty C1/C2) instead of running Attention Matching. The orchestrator skips env_client injection for CompactionEnv, which runs in-process to call /compact_generate on the local inference server. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Introduces n_protect_turns to the session API as an alternative compaction trigger: instead of firing reactively when the KV fills, compaction fires after each complete assistant response, leaving the last n_protect_turns turns untouched. Key design choices: - query_vecs param added to compact_kv_range/compact_kv: when provided, uses real key vectors as importance probes instead of random Gaussian queries. Turn-based compaction passes the most recent assistant response's K-vectors as a lookahead signal grounded in what the model attended to. - KV-budget compaction (pre-decode + in-decode) is disabled when n_protect_turns >= 0, so the two modes do not interfere. - Markovian mode is supported: when compaction_mode="markovian", the turn-based path hard-deletes rather than compresses. Bug fixes included: - Guard against compact_end <= prompt_len (empty region crashes topk) - Off-by-one in asst_start: EOS is counted in turn_asst_lens but not yet written to KV, so asst_len_in_kv = turn_asst_lens[-1] - 1 - Warning logged when compact_window is set alongside n_protect_turns, since compact_window is ignored in turn-based mode Default n_protect_turns=-1 preserves existing KV-budget behaviour exactly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

eval_balrog_babyai.py evaluates models on BabyAI (MiniGrid) grid-world tasks in three modes: baseline (standard vLLM), compaction (KV budget or turn-based), and markovian. Supports persistent KV sessions (--use-sessions) to avoid re-prefilling full history each turn, and --n-protect-turns for turn-based compaction. eval_aime_rsa.py benchmarks RSA vs baseline on AIME problems. New configs cover markovian training (qwen3_4b_markovian_*), a 0.6B fast test variant (qwen3_06b_*), and BabyAI baseline training. The serve config is updated to Qwen3-4B-Instruct-2507 with max_model_len=32768. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…files The broad *.toml / *.sbatch / *.sh rules were blocking new source configs and job scripts from being tracked. Added negation rules for configs/**/*.toml and scripts/*.{sbatch,sh} so these are committable while local outputs remain ignored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

README.md gains a Turn-Based Compaction section documenting n_protect_turns, the KV layout, comparison table against KV-budget and markovian modes, and API/CLI usage examples. The Markovian section and eval result tables are updated. IMPLEMENTATION.md and CLAUDE.md reflect the new key files, configs, and launch commands. eval_rg_mix.py adds RSA mode support and misc eval improvements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…igrid dep - Add BabyAI Evaluation section: tasks, all CLI modes (baseline, KV-budget, turn-based with --n-protect-turns, markovian), key flags table, setup notes - Add Session API section documenting /compact_session/create, /step, and DELETE with full parameters tables including n_protect_turns - Update Scripts and Configs tables with eval_balrog_babyai.py, viz scripts, qwen3_4b_balrog_babyai.toml, and qwen3_4b_baseline_tp1.toml - Add minigrid>=2.3.0 to project dependencies (needed by eval_balrog_babyai.py) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The #1 performance bottleneck was that each compact_session_step call processed a single sequence per collective_rpc. With 128 rollouts × 60 turns = ~7680 sequential RPCs across only 4 DP engines. The GPU was running batch=1 forward passes when it could handle batch=16+. Changes: - worker.py: Add compact_session_step_batch() that packs B sessions into one batched append-prefill + one batched decode loop. Post-decode compaction is handled per-session (cheap vs forward passes). - routes.py: Add _SessionStepBatcher that transparently accumulates concurrent /compact_session/step requests within 50ms windows, groups by DP rank, and dispatches one compact_session_step_batch per rank. No env.py changes needed — the batcher is transparent to HTTP callers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Emilianopp and others added 7 commits March 20, 2026 15:28

Add gradio dependency for visualization tooling

a9ed4dc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn-based compaction, markovian training support, BabyAI eval#1

Turn-based compaction, markovian training support, BabyAI eval#1
Emilianopp wants to merge 7 commits intomainfrom
feature/balrog-babyai

Emilianopp commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Emilianopp commented Mar 20, 2026

Summary

Breaking changes

Bug fixes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant