Skip to content

Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean)#1901

Open
Karen042009 wants to merge 1 commit intoopenai:mainfrom
Karen042009:main
Open

Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean)#1901
Karen042009 wants to merge 1 commit intoopenai:mainfrom
Karen042009:main

Conversation

@Karen042009
Copy link
Copy Markdown

New submission for the 10min/16MB track.

Key features:

  • DualTokenHashSkip: Bigram skip connections with dual hash tables.
  • LayerScale Recurrence: Recurrent structure with learnable LayerScale coefficients.
  • SharedMoE: Hybrid MoE with shared and specialized experts.
  • AdaMuon Optimizer: RMS pre-conditioning and Riemannian Newton-Schulz orthogonalization.
  • Dynamic MSE SDClip: Optimal INT6 quantization search.
  • Score-First TTT: Compliant 2-pass Test-Time Training.

Detailed results and architectural information are available in the included README.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants