[WIP] Recurrent MQA Transformer — depth recurrence + weight tying (nidhilak-Aquarius)#29
Closed
nidhilak-Aquarius wants to merge 5 commits intoopenai:mainfrom
Closed
[WIP] Recurrent MQA Transformer — depth recurrence + weight tying (nidhilak-Aquarius)#29nidhilak-Aquarius wants to merge 5 commits intoopenai:mainfrom
nidhilak-Aquarius wants to merge 5 commits intoopenai:mainfrom
Conversation
Author
|
Recurrent MQA Transformer — Core Logic This submission focuses on maximizing effective model capacity under a strict artifact constraint through parameter sharing and architectural efficiency:
Artifact size (measured): ~2.82MB (int8 + zlib, smoke test) — well under the 16MB constraint, leaving substantial headroom for further optimization. Hypothesis: Increasing recurrence depth (N=12) improves performance over shallower configurations at fixed parameter count, with diminishing returns beyond N~16. Local smoke test completed successfully; full GPU evaluation (val_bpb) pending compute grant. |
gHashTag
added a commit
to gHashTag/parameter-golf
that referenced
this pull request
Apr 30, 2026
openai#143) (openai#29) * feat(audit): tri-railway audit run + verdict CLI (closes openai#9, refs openai#143) Anchor: phi^2 + phi^-2 = 3. Adds the online-audit subcommands that close L-R14 (Gate-2 verdict) formally: tri-railway audit run --project <UUID> --target <BPB> [--ledger PATH] [--json] tri-railway audit verdict --ledger PATH --target <BPB> Behaviour: - audit run lists Railway services for a project (Q::project_view), converts them to RealService (with seed parsed from name like 'trios-train-seed-43' or 'igla-final-seed-44'), optionally loads a JSONL ledger, calls trios_railway_audit::detect to produce the full D1..D7 drift event set, runs verdict() to compute Gate2Pass / NotYet / Drift, prints a text summary, optionally JSON, then seals one R7 audit triplet to .trinity/experience via the existing experience writer. Exit codes: 0 = GATE-2 PASS (>= 3 services with bpb < target, no error drift) 1 = DRIFT (any error-severity event) 2 = NOT YET (no errors, target not yet met) - audit verdict is the offline form for cron/CI: takes a JSONL ledger snapshot already serialized from Neon, computes the same verdict against synthetic services, prints one line, exits with the same codes. Auth fix in trios-railway-core: RAILWAY_TOKEN_AUTH env var allows forcing 'team' (Bearer) vs 'project' (Project-Access-Token) when the UUID-shape heuristic guesses wrong. Personal API tokens are also UUID-shaped but require Bearer; without this override, authenticating to backboard.railway.com returned 'Not Authorized'. Verified with both curl variants against the live IGLA project. R5-honest verification (logs in PR body): cargo build --bin tri-railway --locked : OK cargo test --workspace --locked : 22 passed, 0 failed cargo clippy -D warnings : 0 Live smoke against IGLA (e4fe33bb-...): 18 services, 16 D1_ORPHAN warnings, NOT YET, exit=2, R7 triplet sealed at /tmp/audit-smoke/.trinity/experience/<date>.trinity Synthetic ledger smoke: 3 seeds bpb<1.85 -> GATE-2 PASS, exit=0 3 seeds bpb<1.85 vs target=1.50 -> NOT YET, exit=2 1 seed bpb=3.5e38 -> DRIFT (D5_OVERFLOW), exit=1 Closes openai#9. Refs openai#143 (IGLA RACE Gate-2 / L-R14). * style(audit): cargo fmt --all (CI format-check fix) --------- Co-authored-by: Perplexity Computer <computer@perplexity.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Recurrent MQA Transformer — WIP Submission
My approach draws from two ideas separated by 2,000 years.
The Chakravyuha in the Mahabharata achieves depth through repetition —
one structural unit looping inward, creating power far beyond its apparent
size. Kalaripayattu, Kerala's martial art, teaches that maximum force comes
from finding the exact marma point, not from raw strength.
Core innovation: One shared TransformerBlock looped 12 times instead
of 9 unique blocks. Same computational depth. 12x fewer unique parameters.
The marma insight: weight sharing acts as a regularizer — the same weights
must generalize across ALL depths simultaneously, forcing more robust
representations than unique per-layer weights ever could.
Architecture:
Results so far:
Hypothesis: Recurrence depth N=12 outperforms N=8 at identical
parameter count, with diminishing returns beyond N=16. The compute
grant will map this curve empirically.
Phase 2: BitNet ternary weights {-1,0,+1} at log2(3)=1.58 bits vs
16 bits = ~10x more effective parameters within the same 16MB artifact.