Non-record: Inhibitory layer on PR #1851 stack (val_bpb 1.06438) by cloud-777-boy · Pull Request #2116 · openai/parameter-golf

cloud-777-boy · 2026-05-01T09:25:05Z

Non-Record Submission: Inhibitory Layers on PR #1851 Stack
Adds a novel architectural primitive — inhibitory layers — to the current SOTA stack from PR #1851 by @aquariouseworkman.
The contribution: a small, low-rank gating mechanism applied to attention and MLP residual paths. Modern transformers have no native subtractive primitive — every layer writes additively to the residual stream, and "removing" a feature requires a downstream layer to learn an equal-and-opposite contribution. Inspired by cortical inhibitory interneurons (~20% of cortical cells) and the fly mushroom body's APL inhibitory neuron, this submission adds a minimal subtractive primitive to the transformer.
Mechanism: two low-rank MLPs per block (d_model → rank → d_model) followed by sigmoid, applied as per-channel multipliers on existing residual scales. Initialized with bias chosen so sigmoid output ≈ 0.95 at init (5% suppression), which empirically maintains gradient flow into the inhibitor weights. Inhibitor weights serialized as gate_int8_row to keep artifact under 16MB.
Results (single seed):

Pre-quantization, post-EMA: 1.06830
Quantized: 1.07734
Quantized + post-TTT phased: 1.06438

This is not a SOTA claim. Single-seed post-TTT val_bpb is 1.06438, ~0.0033 above the current SOTA's 3-seed mean of 1.06108. Based on the reference 3-seed std of 0.00068 and observed seed spread of 0.00133 in PR #1855, a 3-seed mean of this configuration would plausibly land within touching distance of SOTA.
Compliance:

Training-data-access time: 595.6s (< 600s ✓)
Artifact size: 15,996,198 bytes (< 16MB ✓)
Hardware: 8×H100 SXM 80GB

What's in this submission: records/track_non_record_16mb/inhibitor_layers_777-cloud-boy/ contains the modified train_gpt.py, training log, quantized model artifact, and full README explaining the mechanism, configuration, and ablation context.

Added a README for the non-record submission detailing the inhibitory layers on the PR openai#1851 stack, including architecture, mechanism, results, and reproduction steps.

…ctory

Added README.md for non-record submission detailing inhibitory layers on PR openai#1851 stack, including mechanism, configuration, results, and limitations.

non-sota attempt

cloud-777-boy added 13 commits April 30, 2026 13:09

Update train_gpt.py

367d503

Add files via upload

f60046d

Add README for non-record inhibitory layers submission

1e26f03

Added a README for the non-record submission detailing the inhibitory layers on the PR openai#1851 stack, including architecture, mechanism, results, and reproduction steps.

Add files via upload

2d024fe

Add files via upload

8b8377f

Add submission_binary.txt file

d428820

Add README.md for inhibitor_layers_777-cloud-boy

3b1cd60

Delete records/track_non_record_16mb/inhibitor_layers_yourhandle dire…

72db79e

…ctory

Add README for inhibitory layers on PR openai#1851 stack

337ff2e

Added README.md for non-record submission detailing inhibitory layers on PR openai#1851 stack, including mechanism, configuration, results, and limitations.

Add files via upload

d3a938e

Merge pull request #1 from cloud-777-boy/cloud-777-boy-patch-1

b7221a3

non-sota attempt

Rename submission_binary.json to submission.json

c5898c1

Merge branch 'main' into main

5987466

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Inhibitory layer on PR #1851 stack (val_bpb 1.06438)#2116

Non-record: Inhibitory layer on PR #1851 stack (val_bpb 1.06438)#2116
cloud-777-boy wants to merge 13 commits intoopenai:mainfrom
cloud-777-boy:main

cloud-777-boy commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cloud-777-boy commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant