Turn-based compaction, markovian training support, BabyAI eval#1
Open
Emilianopp wants to merge 7 commits intomainfrom
Open
Turn-based compaction, markovian training support, BabyAI eval#1Emilianopp wants to merge 7 commits intomainfrom
Emilianopp wants to merge 7 commits intomainfrom
Conversation
CompactionEnv is upgraded from SingleTurnEnv to MultiTurnEnv to support multi-turn environments (e.g. BabyAI). A new compaction_mode field is added to TrainerConfig and auto-synced from env args via a model validator, so the trainer mirrors whatever mode the inference server uses. The segmented_forward in compaction.py gains a markovian branch that hard-deletes the compaction window (empty C1/C2) instead of running Attention Matching. The orchestrator skips env_client injection for CompactionEnv, which runs in-process to call /compact_generate on the local inference server. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces n_protect_turns to the session API as an alternative compaction trigger: instead of firing reactively when the KV fills, compaction fires after each complete assistant response, leaving the last n_protect_turns turns untouched. Key design choices: - query_vecs param added to compact_kv_range/compact_kv: when provided, uses real key vectors as importance probes instead of random Gaussian queries. Turn-based compaction passes the most recent assistant response's K-vectors as a lookahead signal grounded in what the model attended to. - KV-budget compaction (pre-decode + in-decode) is disabled when n_protect_turns >= 0, so the two modes do not interfere. - Markovian mode is supported: when compaction_mode="markovian", the turn-based path hard-deletes rather than compresses. Bug fixes included: - Guard against compact_end <= prompt_len (empty region crashes topk) - Off-by-one in asst_start: EOS is counted in turn_asst_lens but not yet written to KV, so asst_len_in_kv = turn_asst_lens[-1] - 1 - Warning logged when compact_window is set alongside n_protect_turns, since compact_window is ignored in turn-based mode Default n_protect_turns=-1 preserves existing KV-budget behaviour exactly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
eval_balrog_babyai.py evaluates models on BabyAI (MiniGrid) grid-world tasks in three modes: baseline (standard vLLM), compaction (KV budget or turn-based), and markovian. Supports persistent KV sessions (--use-sessions) to avoid re-prefilling full history each turn, and --n-protect-turns for turn-based compaction. eval_aime_rsa.py benchmarks RSA vs baseline on AIME problems. New configs cover markovian training (qwen3_4b_markovian_*), a 0.6B fast test variant (qwen3_06b_*), and BabyAI baseline training. The serve config is updated to Qwen3-4B-Instruct-2507 with max_model_len=32768. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…files
The broad *.toml / *.sbatch / *.sh rules were blocking new source configs
and job scripts from being tracked. Added negation rules for configs/**/*.toml
and scripts/*.{sbatch,sh} so these are committable while local outputs remain
ignored.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
README.md gains a Turn-Based Compaction section documenting n_protect_turns, the KV layout, comparison table against KV-budget and markovian modes, and API/CLI usage examples. The Markovian section and eval result tables are updated. IMPLEMENTATION.md and CLAUDE.md reflect the new key files, configs, and launch commands. eval_rg_mix.py adds RSA mode support and misc eval improvements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…igrid dep - Add BabyAI Evaluation section: tasks, all CLI modes (baseline, KV-budget, turn-based with --n-protect-turns, markovian), key flags table, setup notes - Add Session API section documenting /compact_session/create, /step, and DELETE with full parameters tables including n_protect_turns - Update Scripts and Configs tables with eval_balrog_babyai.py, viz scripts, qwen3_4b_balrog_babyai.toml, and qwen3_4b_baseline_tp1.toml - Add minigrid>=2.3.0 to project dependencies (needed by eval_balrog_babyai.py) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Emilianopp
pushed a commit
that referenced
this pull request
Apr 8, 2026
The #1 performance bottleneck was that each compact_session_step call processed a single sequence per collective_rpc. With 128 rollouts × 60 turns = ~7680 sequential RPCs across only 4 DP engines. The GPU was running batch=1 forward passes when it could handle batch=16+. Changes: - worker.py: Add compact_session_step_batch() that packs B sessions into one batched append-prefill + one batched decode loop. Post-decode compaction is handled per-session (cheap vs forward passes). - routes.py: Add _SessionStepBatcher that transparently accumulates concurrent /compact_session/step requests within 50ms windows, groups by DP rank, and dispatches one compact_session_step_batch per rank. No env.py changes needed — the batcher is transparent to HTTP callers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
n_protect_turns): fires after each complete assistant response rather than reactively mid-generation. Uses the most recent assistant response's K-vectors as importance queries (lookahead grounded in what the model actually attended to). Markovian mode is supported. KV-budget compaction is disabled when turn-based is active so the two modes don't interfere.CompactionEnvupgraded toMultiTurnEnv,compaction_modefield added to trainer config with auto-sync from env args,segmented_forwardgains a markovian hard-delete branch, orchestrator skipsenv_clientinjection forCompactionEnv.eval_balrog_babyai.py): evaluates baseline, compaction, and markovian modes on MiniGrid tasks with persistent KV sessions. Also addseval_aime_rsa.pyfor AIME benchmarking.Breaking changes
configs/compaction/qwen3_4b_serve_tp1.toml: model updated fromQwen/Qwen3-4B→Qwen/Qwen3-4B-Instruct-2507,max_model_len = 32768added.eval_balrog_babyai.pydefault--modelupdated to match.Everything else is backwards compatible — all new parameters default to preserving existing behaviour (
n_protect_turns=-1,compaction_mode="attention_matching").Bug fixes
compact_end <= prompt_lenin turn-based compaction (empty region crashestopk)asst_start: EOS is counted inturn_asst_lensbut not yet written to KVcompact_windowis set alongsiden_protect_turns(ignored in turn-based mode)Test plan
eval_balrog_babyai.py --mode compaction --use-sessions --n-protect-turns 1 --n 1eval_balrog_babyai.py --mode markovian --use-sessions --n-protect-turns 1 --n 1n_protect_turns=-1(default) still uses KV-budget path unchangedcompact_windowwarning fires when set withn_protect_turns >= 0🤖 Generated with Claude Code