You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Embedding-space delta optimized with 8 AdamW steps per chunk.
Worse than both sliding window (1.1246) and naive eval (1.1479).
Lesson: SLOT needs L-BFGS in logit space (see exp_075), not AdamW in
embedding space. 8 steps underfits, and the embedding-space loss
surface is non-convex.
Also bumped QK-Gain 1.5 -> 4.0 (free -0.006 BPB from PR openai#1125).
0 commit comments