Skip to content

Submission/hybrid RWKV token shift#1112

Open
dillon-blake wants to merge 1 commit intoopenai:mainfrom
dillon-blake:submission-tokenshift
Open

Submission/hybrid RWKV token shift#1112
dillon-blake wants to merge 1 commit intoopenai:mainfrom
dillon-blake:submission-tokenshift

Conversation

@dillon-blake
Copy link
Copy Markdown

Non-record submission exploring hybrid transformer architectures that replace most attention layers with a lightweight RWKV-inspired token-shift mixing mechanism. The core idea is that most layers in a transformer only need local context, so full quadratic attention is wasteful for them. Instead, 8 of 11 layers use a simple token-shift operation that blends adjacent tokens via learned per-dimension interpolation weights, while only 3 layers retain quadratic attention with short (128-token) windows (except the final attention layer which keeps full context).
The architecture achieves a 3-seed mean val_bpb of 1.2252 with 17.0M parameters, int6 quantized and zlib compressed to ~15.86 MB. While this does not beat the current SOTA, I believe the token-shift approach is promising for its efficiency — particularly for inference, where the reduced attention overhead could significantly speed up decoding.
Beyond the hybrid architecture, the submission stacks several techniques from the leaderboard: SmearGate, bigram hash embeddings, value embeddings, XSA (cross-head suppression), partial RoPE (16/64 dims), LeakyReLU squared activation, Muon optimizer, EMA with late QAT, and logit softcapping. Full details and ablation notes are in the README.

@MatoTeziTanka
Copy link
Copy Markdown

MatoTeziTanka commented Apr 11, 2026

Community Review — Submission/hybrid RWKV token shift

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'kernels'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'kernels'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants