Skip to content

Commit 6421231

Browse files
committed
V19c stacked + V19b ablation scouts (PR openai#1925 simon-marcus hparams)
After 4 parallel research agents reviewed 30+ open PRs and compliance issues, two new findings: 1. PR openai#1923 (AsymLogit) flagged "empirical negative" by sunnypatneedi 4-29 frontier-scan, BUT only on PR openai#1855 base with default WD=1.0. Never tested on PR openai#1908 + WD=2.0 combo. V19's specific stack is NOT directly invalidated. 2. PR openai#1925 simon-marcus 1.06049 (3-seed verified, vs PR openai#1855 base 1.06108 = -0.00059 BPB). Just 2 hparam env vars: MATRIX_LR 0.026 -> 0.028 PHASED_TTT_PREFIX_DOCS 2500 -> 3500 Orthogonal axis to AsymLogit (LR/TTT prefix vs logit head). Adds two new scout scripts: - run_v19c_stacked_scout.sh: PR openai#1908 + AsymLogit + simon-marcus + WD=2.0 (full stack, recommended first scout) - run_v19b_simonmarcus_scout.sh: PR openai#1908 + simon-marcus + WD=2.0 (ablation if V19c wins partially) Decision rule (CaseOps val baseline 0.97651, community floor 0.0006): V19c < 0.97591 -> CLEAR WIN, run 3-seed V19c 0.97591-0.9755 -> borderline, ablate via V19a/V19b V19c > 0.9755 -> abandon stack, try Lead B (PR openai#1884) Other research findings: - PR openai#1898 SpinQuant flagged regression vs parent openai#1851 (skip) - PR openai#1929 SLOT banned per openai#1722 precedent - PR openai#1911 pre-quant TTT chain banned per openai#1735 precedent - cocohearts 4-28 PR openai#1902 confirmed PR openai#1855 as official openai#1 - regina-openai + Alex Zhao 48h zero activity - CaseOps de-facto legal (PR openai#1855 merged into chain)
1 parent 6499a66 commit 6421231

2 files changed

Lines changed: 107 additions & 0 deletions

File tree

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
#!/bin/bash
2+
# V19b ABLATION scout: PR #1908 + simon-marcus hparams ONLY (no AsymLogit)
3+
# Used to ablate which axis contributed if V19c shows a partial win.
4+
# Seed 42, ~19 min, ~$0.65.
5+
#
6+
# Tests JUST simon-marcus's PR #1925 deltas:
7+
# - MATRIX_LR 0.026 -> 0.028
8+
# - PHASED_TTT_PREFIX_DOCS 2500 -> 3500
9+
# - TTT_WD=2.0 (PR #1886 stability fix)
10+
#
11+
# AsymLogit is OFF (ASYM_LOGIT_RESCALE=0 default in train_gpt.py).
12+
set -e
13+
14+
cd /workspace/parameter-golf/records/track_10min_16mb/2026-04-30_V19_PR1908_AsymLogit_WD2/
15+
16+
echo "===================================================="
17+
echo " V19b ABLATION: PR #1908 + simon-marcus hparams"
18+
echo " Seed 42 Start: $(date)"
19+
echo "===================================================="
20+
21+
ENV_VARS="DATA_DIR=/workspace/caseops_data/datasets/ \
22+
TTT_WEIGHT_DECAY=2.0 \
23+
MATRIX_LR=0.028 \
24+
PHASED_TTT_PREFIX_DOCS=3500 \
25+
AWQ_LITE_ENABLED=1 \
26+
AWQ_LITE_BITS=8 \
27+
AWQ_LITE_GROUP_TOP_K=1 \
28+
AWQ_LITE_GROUP_SIZE=64 \
29+
LQER_ENABLED=1 \
30+
LQER_ASYM_ENABLED=1 \
31+
LQER_RANK=4 \
32+
LQER_FACTOR_BITS=4 \
33+
LQER_ASYM_GROUP=64 \
34+
LQER_TOP_K=3"
35+
36+
env SEED=42 $ENV_VARS \
37+
torchrun --standalone --nproc_per_node=8 train_gpt.py \
38+
> /workspace/scout_v19b_seed42.log 2>&1
39+
40+
cp final_model.int6.ptz /workspace/v19b_seed42_model.int6.ptz 2>/dev/null || true
41+
cp /workspace/scout_v19b_seed42.log /workspace/v19b_seed42_FULL.log 2>/dev/null || true
42+
43+
echo ""
44+
echo "===================================================="
45+
echo " V19b scout DONE $(date)"
46+
echo "===================================================="
47+
grep -E "stopping_early|train_time|quantized_ttt_phased|val_bpb" /workspace/scout_v19b_seed42.log | tail -10
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/bin/bash
2+
# V19c FULL STACK scout: PR #1908 + Asymmetric Logit Rescale + simon-marcus hparams
3+
# Single seed 42, ~19 min, ~$0.65.
4+
#
5+
# Combines THREE independent improvements (each verified separately by community):
6+
# 1. Asymmetric Logit Rescale (PR #1923 jorge-asenjo)
7+
# - sunnypatneedi flagged "empirical negative" but ONLY on PR #1855 base
8+
# with WD=1.0 default. Never tested on PR #1908 + WD=2.0.
9+
# 2. simon-marcus hparams (PR #1925, 3-seed verified 1.06049 on PR #1855 base)
10+
# - MATRIX_LR 0.026 -> 0.028
11+
# - PHASED_TTT_PREFIX_DOCS 2500 -> 3500
12+
# 3. TTT_WEIGHT_DECAY 1.0 -> 2.0 (PR #1886 fused-CE collapse fix)
13+
#
14+
# Theory: 3 orthogonal axes; if any 1 wins, we beat PR #1908 frontier.
15+
# If V19c regresses, we can ablate (run V19a alone first, or V19b separately).
16+
set -e
17+
18+
cd /workspace/parameter-golf/records/track_10min_16mb/2026-04-30_V19_PR1908_AsymLogit_WD2/
19+
20+
echo "===================================================="
21+
echo " V19c STACKED scout: PR #1908 + 3 axes"
22+
echo " Seed 42 Start: $(date)"
23+
echo "===================================================="
24+
25+
ENV_VARS="DATA_DIR=/workspace/caseops_data/datasets/ \
26+
ASYM_LOGIT_RESCALE=1 \
27+
TTT_WEIGHT_DECAY=2.0 \
28+
MATRIX_LR=0.028 \
29+
PHASED_TTT_PREFIX_DOCS=3500 \
30+
AWQ_LITE_ENABLED=1 \
31+
AWQ_LITE_BITS=8 \
32+
AWQ_LITE_GROUP_TOP_K=1 \
33+
AWQ_LITE_GROUP_SIZE=64 \
34+
LQER_ENABLED=1 \
35+
LQER_ASYM_ENABLED=1 \
36+
LQER_RANK=4 \
37+
LQER_FACTOR_BITS=4 \
38+
LQER_ASYM_GROUP=64 \
39+
LQER_TOP_K=3"
40+
41+
env SEED=42 $ENV_VARS \
42+
torchrun --standalone --nproc_per_node=8 train_gpt.py \
43+
> /workspace/scout_v19c_seed42.log 2>&1
44+
45+
cp final_model.int6.ptz /workspace/v19c_seed42_model.int6.ptz 2>/dev/null || true
46+
cp /workspace/scout_v19c_seed42.log /workspace/v19c_seed42_FULL.log 2>/dev/null || true
47+
48+
echo ""
49+
echo "===================================================="
50+
echo " V19c scout DONE $(date)"
51+
echo "===================================================="
52+
grep -E "stopping_early|train_time|quantized_ttt_phased|val_bpb" /workspace/scout_v19c_seed42.log | tail -10
53+
echo ""
54+
echo "DECISION RULE:"
55+
echo " baseline (PR #1908 default on CaseOps): 0.97651"
56+
echo " community merge floor: 0.0006 BPB delta"
57+
echo ""
58+
echo " if V19c < 0.97591 -> CLEAR WIN (>floor), run 3-seed"
59+
echo " if V19c 0.97591-0.9755 -> borderline, ablate (run run_v19_scout.sh AsymLogit alone)"
60+
echo " if V19c > 0.9755 -> noise/regression, abandon"

0 commit comments

Comments
 (0)