Skip to content

Non-record: Inhibitory layer on PR #1851 stack (val_bpb 1.06438)#2116

Open
cloud-777-boy wants to merge 13 commits intoopenai:mainfrom
cloud-777-boy:main
Open

Non-record: Inhibitory layer on PR #1851 stack (val_bpb 1.06438)#2116
cloud-777-boy wants to merge 13 commits intoopenai:mainfrom
cloud-777-boy:main

Conversation

@cloud-777-boy
Copy link
Copy Markdown

Non-Record Submission: Inhibitory Layers on PR #1851 Stack
Adds a novel architectural primitive — inhibitory layers — to the current SOTA stack from PR #1851 by @aquariouseworkman.
The contribution: a small, low-rank gating mechanism applied to attention and MLP residual paths. Modern transformers have no native subtractive primitive — every layer writes additively to the residual stream, and "removing" a feature requires a downstream layer to learn an equal-and-opposite contribution. Inspired by cortical inhibitory interneurons (~20% of cortical cells) and the fly mushroom body's APL inhibitory neuron, this submission adds a minimal subtractive primitive to the transformer.
Mechanism: two low-rank MLPs per block (d_model → rank → d_model) followed by sigmoid, applied as per-channel multipliers on existing residual scales. Initialized with bias chosen so sigmoid output ≈ 0.95 at init (5% suppression), which empirically maintains gradient flow into the inhibitor weights. Inhibitor weights serialized as gate_int8_row to keep artifact under 16MB.
Results (single seed):

Pre-quantization, post-EMA: 1.06830
Quantized: 1.07734
Quantized + post-TTT phased: 1.06438

This is not a SOTA claim. Single-seed post-TTT val_bpb is 1.06438, ~0.0033 above the current SOTA's 3-seed mean of 1.06108. Based on the reference 3-seed std of 0.00068 and observed seed spread of 0.00133 in PR #1855, a 3-seed mean of this configuration would plausibly land within touching distance of SOTA.
Compliance:

Training-data-access time: 595.6s (< 600s ✓)
Artifact size: 15,996,198 bytes (< 16MB ✓)
Hardware: 8×H100 SXM 80GB

What's in this submission: records/track_non_record_16mb/inhibitor_layers_777-cloud-boy/ contains the modified train_gpt.py, training log, quantized model artifact, and full README explaining the mechanism, configuration, and ablation context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant