[Non-Record] Hymba-8L: Hybrid SSM + Sliding Window Attention with 32K Context (1.1470 BPB) by mkenney2 · Pull Request #1245 · openai/parameter-golf

mkenney2 · 2026-04-02T02:30:36Z

Summary

This submission uses a hybrid architecture combining Mamba SSM with sliding window attention (SWA), which allows us to train at 32x longer context (32,768 tokens) than the standard baseline (1,024 tokens) under the same compute and time constraints. Unlike full attention which scales quadratically, SWA and Mamba both scale linearly, making long-context training feasible within the 10-minute wall-clock budget.

Building on our previous Hymba submission (1.1873 BPB, 7L), this version adds a systematic ablation study across architecture, regularization, quantization, and evaluation strategies, yielding a -0.040 BPB improvement.

Results

Seed	val_bpb	val_loss	Steps	Artifact Size
1337	1.1474	1.9374	6,621	15.7 MB
42	1.1469	1.9366	6,620	15.6 MB
7	1.1468	1.9363	6,606	15.3 MB
Mean	1.1470 ± 0.0003

Training: 600s on 8xH100 SXM, ~90.7 ms/step
Evaluation: Score-first TTT (25 epochs), ~580s
Artifact: int8 + zstd-22, under 16 MB

…470 BPB) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatoTeziTanka · 2026-04-02T22:53:36Z

@mkenney2 Heads up — your submission shows 3 valid seeds (1337, 42, 7) but may be getting flagged as incomplete by automated tooling. The issue is your submission.json uses a non-standard schema:

"results" should be "seed_results" — every merged submission uses this key
"submission_name" should be "name"
Missing "author" and "github_id" fields

Quick fix — rename those fields to match the standard schema (see PR #1019 for reference). That should resolve the seed count issue.

(Flagged via the Agora)

- Rename submission_name -> name, results -> seed_results - Add author, github_id, blurb, date fields - Add exact val_loss/val_bpb means and stds - Add artifact_bytes_mean/max, step_avg_ms_mean - Use full precision values from logs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mkenney2 · 2026-04-02T23:05:34Z

@mkenney2 Heads up — your submission shows 3 valid seeds (1337, 42, 7) but may be getting flagged as incomplete by automated tooling. The issue is your submission.json uses a non-standard schema:

"results" should be "seed_results" — every merged submission uses this key

"submission_name" should be "name"

Missing "author" and "github_id" fields

Quick fix — rename those fields to match the standard schema (see PR #1019 for reference). That should resolve the seed count issue.

(Flagged via the Agora)

@MatoTeziTanka Thank you! Super helpful.

- Type column supports multiple tags per PR (e.g. Neural + TTT) - Filter JS updated: clicking TTT shows all PRs containing TTT - Reclassified TTT submissions as Neural + TTT - Community: resolved issue #6 (mkenney2 schema fix for PR openai#1245) - Community: posted feedback on issue openai#140 for PR openai#1215 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

valerio-oai · 2026-05-03T05:39:20Z

Thanks for this submission, I'd like to merge this into the non-record leaderboard.

Before we merge, could you change the files' location? The PR title marks this as non-record, but the files are currently under:

records/track_10min_16mb/

For it to be an eligible non-record submission, please move it under records/track_non_record_16mb/....

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mkenney2 · 2026-05-04T00:07:41Z

@valerio-oai great! I moved the file. Thanks!

mkenney2 and others added 4 commits April 1, 2026 19:17

[Non-Record] Hymba-8L-SSM4-SWA1024: 32K context hybrid SSM + SWA (1.1…

6bfd3be

…470 BPB) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Clarify constant step time vs memory cost for long context

517032c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix: clarify linear scaling of SSM and SWA per-token compute

d5249a9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Simplify context length discussion, remove step time table

39ec0ae

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mkenney2 mentioned this pull request Apr 2, 2026

PR #1245 Listed as 2 Seeds MatoTeziTanka/parameter-golf#6

Closed

Move submission to track_non_record_16mb per reviewer request

b540356

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Non-Record] Hymba-8L: Hybrid SSM + Sliding Window Attention with 32K Context (1.1470 BPB)#1245

[Non-Record] Hymba-8L: Hybrid SSM + Sliding Window Attention with 32K Context (1.1470 BPB)#1245
mkenney2 wants to merge 6 commits intoopenai:mainfrom
mkenney2:hymba-ssm4-submission

mkenney2 commented Apr 2, 2026

Uh oh!

MatoTeziTanka commented Apr 2, 2026

Uh oh!

mkenney2 commented Apr 2, 2026

Uh oh!

valerio-oai commented May 3, 2026 •

edited

Loading

Uh oh!

mkenney2 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mkenney2 commented Apr 2, 2026

Summary

Results

Uh oh!

MatoTeziTanka commented Apr 2, 2026

Uh oh!

mkenney2 commented Apr 2, 2026

Uh oh!

valerio-oai commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkenney2 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

valerio-oai commented May 3, 2026 •

edited

Loading