Skip to content

Update README typo#9

Merged
0hq merged 1 commit intoopenai:mainfrom
oof-baroomf:patch-1
Mar 18, 2026
Merged

Update README typo#9
0hq merged 1 commit intoopenai:mainfrom
oof-baroomf:patch-1

Conversation

@oof-baroomf
Copy link
Copy Markdown
Contributor

Think NanoGPT Speedrun is optimizing time, not loss

Copy link
Copy Markdown
Contributor

@0hq 0hq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@0hq 0hq merged commit 09c3e8e into openai:main Mar 18, 2026
kxddry pushed a commit to kxddry/parameter-golf that referenced this pull request Mar 19, 2026
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…g for Muon optimizer

From PR openai#1440 + arxiv:2603.09697 "Mousse: Rectifying the Geometry of Muon with
Curvature-Aware Preconditioning" (Feb 2026).

Inserts ~5 lines of diagonal preconditioning before zeropower_via_newtonschulz5
in the Muon optimizer step. Normalizes momentum gradient by row/col norms before
spectral orthogonalization, trace-normalizing the matrix:

  G_pre = G / (||row||_2 * ||col||_2)

Gated by USE_MOUSSE=1, falls back to vanilla Muon when unset. Idempotent via
MOUSSE_MARKER. Anchored on the unique zeropower call which is invariant under
all existing 22 patches.

This is the FIRST shippable finding in 5 research fires that fits our
train_loss metric (optimizer-side change affects training directly, unlike
EMA/Tilt/GPTQ which only affect eval). Subagent recommended PASS due to
medium effort estimate; overrode after confirming PR openai#1440 ships only the
SIMPLIFIED diagonal preconditioning version (5 LOC, not 50-80).

4 MS experiments queued for validation:
  MS0_mousse_alone, MS1_mousse_plus_leaky_ng, MS2_mousse_seed42, MS3_mousse_plus_engram

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
… confirmation),

MR2 promising, PR openai#1430 MERGED at 0.39642 BPB

Subagent reports PR openai#1430 (Per-Sample SLOT + Causal Backoff N-gram Mixer + TTT)
has been MERGED at claimed 0.39642 BPB — 65% below public SOTA. If real, this
fundamentally changes the competitive landscape. Audit fires openai#1-3 all flagged
this PR as likely illegal under issue openai#677. Now MERGED.

NEXT RESEARCH FIRE PRIORITY: deep-dive PR openai#1430 to verify legality and extract
implementation. If real, port it. If leak-based, document it.

Patches 17 (Mousse) and 18 (MuonEq-R) confirmed as known PORTS, not novel-to-comp.
They were always documented as ports in research fires openai#9 and openai#10.

Patches 15/16/21 still uncontested in 120+ open + 10 closed PRs (4 audits in a row).

Pod healthy, ~$2.30/$36 spend. MR2_seed42 = 3.3004 (better than MS2 = 3.3358),
suggesting MuonEq-R may slightly beat Mousse at L5 stack. Falsification of
Patches 17 and 18 proceeding rapidly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…util

After 5 emergency interventions in 2 hours, the speed fix is finally working:
  GPU Memory: 744 MB -> 3370 MB (4.5x)
  GPU Util: 34% -> 100% (3x, FULLY MAXED)
  Power: 149W -> 218W
  Total compute/step: 270 GFLOP -> 17 TFLOP (64x)
  Total tokens/experiment: 1.5M -> 24M (16x)

CHAMP_L5_seed42 currently running successfully:
  step:100 train_loss:3.6128 step_avg:861ms

The actual root cause was Patch 22 EngramLite init anchor mismatch.
The torch.compile crashes were a red herring — every experiment was
crashing with AttributeError on self._engram_lite_enabled because the
forward apply ran but the init didn't. getattr wrap fixed it.

All prior "neutrality plateau" verdicts are now CONFIRMED INVALID:
Mousse/MuonEq-R/NorMuon/Depth Recurrence/Coprime/EngramLite/QK_GAIN
were all measured on 0.75% of intended data volume. Need re-validation.

PR openai#1430 still OPEN, 24h no activity. Patches 15/16/20/21/25 still novel
(9th consecutive audit confirmation).

NEW finding: TMA Megakernel in 5 PRs (custom Triton kernel, hardware-side).
We have ZERO hardware-side patches. Highest-leverage missing technique.

Spend ~$6.33/$36 (17.6%). Far below $25 flag threshold.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gHashTag pushed a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
- Cargo workspace with 3 crates + bin/tri-railway
- trios-railway-core: ProjectId/EnvironmentId/ServiceId/DeployId newtypes,
  RailwayHash::seal (R7 audit triplet), Client over Railway GraphQL v2
- trios-railway-audit: DriftCode D1..D7 + DriftEvent + verdict (Gate-2
  PASS criterion), idempotent Neon DDL (railway_projects, railway_services,
  railway_audit_runs, railway_audit_events, v_railway_drift_open)
- trios-railway-experience: append-only L7 writer to
  .trinity/experience/<YYYYMMDD>.trinity (L21-safe, no truncation)
- bin/tri-railway: clap CLI with 'version', 'audit migrate-sql',
  'experience append' (mutating verbs deferred to issues openai#4..openai#9)
- LICENSE Apache-2.0, README, AGENTS.md, TASK.md per crate
- CI: fmt --check + clippy -D warnings + build + test
- Neon DDL applied to neondb (5 objects verified)
- 16 unit tests passing (6 audit + 8 core + 2 experience)
- ascii-only sources, R1 (no .sh / no Python in scripts/)

Anchor: phi^2 + phi^-2 = 3.

Closes #1

Agent: GENERAL
gHashTag added a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
openai#143) (openai#29)

* feat(audit): tri-railway audit run + verdict CLI (closes openai#9, refs openai#143)

Anchor: phi^2 + phi^-2 = 3.

Adds the online-audit subcommands that close L-R14 (Gate-2 verdict)
formally:

  tri-railway audit run     --project <UUID> --target <BPB> [--ledger PATH] [--json]
  tri-railway audit verdict --ledger PATH    --target <BPB>

Behaviour:
  - audit run lists Railway services for a project (Q::project_view),
    converts them to RealService (with seed parsed from name like
    'trios-train-seed-43' or 'igla-final-seed-44'), optionally loads
    a JSONL ledger, calls trios_railway_audit::detect to produce the
    full D1..D7 drift event set, runs verdict() to compute Gate2Pass /
    NotYet / Drift, prints a text summary, optionally JSON, then
    seals one R7 audit triplet to .trinity/experience via the
    existing experience writer. Exit codes:
        0 = GATE-2 PASS   (>= 3 services with bpb < target, no error drift)
        1 = DRIFT         (any error-severity event)
        2 = NOT YET       (no errors, target not yet met)
  - audit verdict is the offline form for cron/CI: takes a JSONL
    ledger snapshot already serialized from Neon, computes the same
    verdict against synthetic services, prints one line, exits with
    the same codes.

Auth fix in trios-railway-core: RAILWAY_TOKEN_AUTH env var allows
forcing 'team' (Bearer) vs 'project' (Project-Access-Token) when the
UUID-shape heuristic guesses wrong. Personal API tokens are also
UUID-shaped but require Bearer; without this override, authenticating
to backboard.railway.com returned 'Not Authorized'. Verified with
both curl variants against the live IGLA project.

R5-honest verification (logs in PR body):
  cargo build --bin tri-railway --locked        : OK
  cargo test  --workspace --locked              : 22 passed, 0 failed
  cargo clippy -D warnings                      : 0
  Live smoke against IGLA (e4fe33bb-...):
    18 services, 16 D1_ORPHAN warnings, NOT YET, exit=2,
    R7 triplet sealed at /tmp/audit-smoke/.trinity/experience/<date>.trinity
  Synthetic ledger smoke:
    3 seeds bpb<1.85 -> GATE-2 PASS, exit=0
    3 seeds bpb<1.85 vs target=1.50 -> NOT YET, exit=2
    1 seed bpb=3.5e38 -> DRIFT (D5_OVERFLOW), exit=1

Closes openai#9. Refs openai#143 (IGLA RACE Gate-2 / L-R14).

* style(audit): cargo fmt --all (CI format-check fix)

---------

Co-authored-by: Perplexity Computer <computer@perplexity.ai>
jzmyres pushed a commit to jzmyres/parameter-golf that referenced this pull request May 3, 2026
Triggered by grad_norm=0.07 mid-flight obs in iter 112+122 (well below
grad_clip=1.0 → headroom for deeper backward gradient signal). Backward
coverage at K=16 increases 12.5%→19%.

Iter 85 (H63) PROMOTED stochastic {2,3,4} earlier but at WD=0.30 +
K-jitter (4,6,10) era — different regime. This is fixed=3 retest under
WD=0.01 + K-jitter (16,24) + iter 112+122 (gram=0.1 + softcap=30) baseline.

Expected ~+10% wallclock cost from extra TBPTT step.

Also: Tier 1 reordering (user directive 2026-05-02) — iter 108 (deq_k 16→10)
deprioritized to END of Tier 1 (was launched 04:08, killed at ~step 10,
commit fa22310 reverted). Throughput-only iter with no expected val_bpb
improvement; lower priority than val_bpb-improving + simplification iters.

Files:
  train_gpt.py: Hyperparameters.deq_bptt_k 2 → 3
  CLAUDE.md §5: architecture table mirror
  experiments/hypotheses.md: Tier 1 reordered, iter 108 moved to position openai#9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants