Skip to content

MirrorLoop HRC + LexLoRE non-record submission#2004

Open
corbensorenson wants to merge 9 commits intoopenai:mainfrom
corbensorenson:submission/mirrorloop-lexlore-clean
Open

MirrorLoop HRC + LexLoRE non-record submission#2004
corbensorenson wants to merge 9 commits intoopenai:mainfrom
corbensorenson:submission/mirrorloop-lexlore-clean

Conversation

@corbensorenson
Copy link
Copy Markdown

@corbensorenson corbensorenson commented Apr 30, 2026

Non-record / art submission

This PR adds a non-record/art submission for a novel 16MB architecture family: MirrorLoop HRC + LexLoRE.

This is not claiming an official leaderboard record. It is an art/non-record submission with honest evidence from local, 1xH100, and a very limited 8xH100 window.

The broader experiment repository is public here:

https://github.com/corbensorenson/parameter-golf-experiments

Compute context

I applied for compute support but did not receive any email response before the deadline window. The 8xH100 results here came from an approximately one-hour self-funded RunPod 8xH100 rental. That was all the 8x time/funds I had available, so the 8x rows should be read as narrow evidence from a constrained window, not as a fully tuned multi-seed official record attempt.

Best preserved results

Best under-cap 1xH100 scout at PR-open time:

Candidate BPB Steps Step speed Artifact bytes
h100_batch32k_d704e832_w2200_q8_coreattn1_lqer10t20_vocabmoe_qk55 1.35692129 5018 119.57 ms 15,658,145

Best completed under-cap 8xH100 row from the one-hour self-funded follow-up:

Candidate Final export BPB Train-time val BPB Steps Step speed Artifact bytes Headroom
final8x_legal_196k_r2_d704e768_w2200_wd02_lqer6t12_vocabmoe_qk55 1.35496419 1.3191 6658 90.13 ms 15,989,749 10,251

The first 8x e832 row used the cluster well but exceeded the decimal artifact cap:

Candidate Final export BPB Train-time val BPB Steps Step speed Artifact bytes
final8x_196k_r2_d704e832_w2200_wd02_lqer8t16_vocabmoe_qk55 1.35704747 1.3174 6628 90.54 ms 16,413,081

What the 8xH100 test did and did not show

The 8x run is useful mostly as a negative result. It showed that the code can use the 8xH100 pod efficiently, but it did not unlock a large final exported-loss improvement. Best legal 1x was 1.35692129; best legal 8x was 1.35496419. That small gain suggests the current MirrorLoop/LexLoRE spine is not simply wall-clock limited. The binding issues look more like architecture capacity, export/compression gap, and the 16MB artifact constraint.

The e832 result is still included because it is useful architecture/systems evidence, but it is not a legal under-cap artifact. After seeing that cap miss, I stopped the same-shape higher-LQER rows and moved to e768 legalizer rows.

Included 8xH100 logs

  • logs/8xh100_runpod_final8x_20260430_185628/ - live snapshot while the first 8x matrix was running.
  • logs/8xh100_runpod_final8x_20260430_185628_completed1/ - completed first e832 row plus stopped partial second row.
  • logs/8xh100_runpod_legalfallback_20260430_191032_completed1/ - first completed under-cap e768 legalizer row.
  • logs/8xh100_runpod_legalfallback_20260430_191032_completed2/ - first two completed e768 legalizer rows, including the current best under-cap 8x result.

What is novel here

In plain terms, this is not a standard stack of unique transformer layers. It uses an explicitly routed mirrored recurrent circuit:

0 1 2  |  3 4 5 6 7  |  3 4 5 6 7  |  2 1 0
entry     recurrent middle, pass 1     recurrent middle, pass 2     mirrored exit

The project called this HRC; the README defines that as an hourglass recurrent circuit:

  • a token-facing input tail,
  • a reused recurrent middle,
  • a mirrored output tail,
  • small route/pass signals so reused blocks can learn different roles.

It also uses LexLoRE, implemented under the older VOCAB_MOE_* flag names: small token-conditioned low-rank residual experts at input,loop_first. This is not a full sparse MoE; it is a lightweight lexical adapter bank.

Main ingredients

  • MirrorLoop HRC route: 012 | 34567 | 34567 | 210
  • LexLoRE / VocabMoE low-rank lexical adapters at input,loop_first
  • q8 train-time quantized forward from step 0, including embeddings
  • factored tied embeddings
  • one attention-capable core-entry block, with repeated core blocks otherwise MLP-only
  • QK gain 5.5
  • LQER export repair
  • LZMA artifact compression

Validation performed

  • python -m py_compile on train_gpt.py and helper modules
  • python -m json.tool submission.json
  • bash -n run_1xh100_best.sh
  • README documentation pass for reviewer-readable language and explicit caveats
  • 8xH100 logs copied off the pod and committed to the PR branch as the run progressed

Claims and caveats

Claimed:

  • self-contained non-record/art submission
  • best preserved legal 1xH100 scout result: 1.35692129 BPB
  • best preserved legal 8xH100 one-hour-window result: 1.35496419 BPB
  • a novel mirrored-recurrent / lexical-low-rank architecture lane
  • 8xH100 logs/results included with cap status stated directly

Not claimed:

  • official record eligibility
  • SOTA leaderboard performance
  • statistical significance over multiple seeds
  • a full raw stdout log for the strongest 1xH100 scout run
  • enough 8xH100 search to call the architecture fully tuned

The README is intentionally explicit about these limitations so the submission is easy to review and does not hide the weaker parts of the evidence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant