Non-record: Inhibitory layer on PR #1851 stack (val_bpb 1.06438)#2116
Open
cloud-777-boy wants to merge 13 commits intoopenai:mainfrom
Open
Non-record: Inhibitory layer on PR #1851 stack (val_bpb 1.06438)#2116cloud-777-boy wants to merge 13 commits intoopenai:mainfrom
cloud-777-boy wants to merge 13 commits intoopenai:mainfrom
Conversation
Added a README for the non-record submission detailing the inhibitory layers on the PR openai#1851 stack, including architecture, mechanism, results, and reproduction steps.
Added README.md for non-record submission detailing inhibitory layers on PR openai#1851 stack, including mechanism, configuration, results, and limitations.
non-sota attempt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Non-Record Submission: Inhibitory Layers on PR #1851 Stack
Adds a novel architectural primitive — inhibitory layers — to the current SOTA stack from PR #1851 by @aquariouseworkman.
The contribution: a small, low-rank gating mechanism applied to attention and MLP residual paths. Modern transformers have no native subtractive primitive — every layer writes additively to the residual stream, and "removing" a feature requires a downstream layer to learn an equal-and-opposite contribution. Inspired by cortical inhibitory interneurons (~20% of cortical cells) and the fly mushroom body's APL inhibitory neuron, this submission adds a minimal subtractive primitive to the transformer.
Mechanism: two low-rank MLPs per block (d_model → rank → d_model) followed by sigmoid, applied as per-channel multipliers on existing residual scales. Initialized with bias chosen so sigmoid output ≈ 0.95 at init (5% suppression), which empirically maintains gradient flow into the inhibitor weights. Inhibitor weights serialized as gate_int8_row to keep artifact under 16MB.
Results (single seed):
Pre-quantization, post-EMA: 1.06830
Quantized: 1.07734
Quantized + post-TTT phased: 1.06438
This is not a SOTA claim. Single-seed post-TTT val_bpb is 1.06438, ~0.0033 above the current SOTA's 3-seed mean of 1.06108. Based on the reference 3-seed std of 0.00068 and observed seed spread of 0.00133 in PR #1855, a 3-seed mean of this configuration would plausibly land within touching distance of SOTA.
Compliance:
Training-data-access time: 595.6s (< 600s ✓)
Artifact size: 15,996,198 bytes (< 16MB ✓)
Hardware: 8×H100 SXM 80GB
What's in this submission: records/track_non_record_16mb/inhibitor_layers_777-cloud-boy/ contains the modified train_gpt.py, training log, quantized model artifact, and full README explaining the mechanism, configuration, and ablation context.