Skip to content

Commit 1257be6

Browse files
authored
Merge pull request #1644 from mradassaad/mamba3-sp8192-ttt-pr
Non-record: Mamba-3 Hybrid SSM + SP8192 + Legal TTT — 1.1473 bpb
2 parents fdde8dc + a9573b2 commit 1257be6

5 files changed

Lines changed: 2373 additions & 0 deletions

File tree

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Mamba-3 Hybrid SSM + SP8192 + Legal TTT — 1.1473 bpb
2+
3+
Non-record submission. See PR description for full details.
4+
5+
## Run
6+
7+
```bash
8+
# Install Mamba-3
9+
bash setup_mamba3.sh
10+
11+
# Generate SP8192 data (~35 min)
12+
cd data && python3 download_hf_docs_and_tokenize.py \
13+
--output-root . --tokenizer-config tokenizer_specs_8192.json --skip-byte
14+
15+
# Train + eval
16+
VOCAB_SIZE=8192 NUM_LAYERS=7 NUM_ATTN_LAYERS=2 USE_BIGRAM_HASH=0 TRAIN_SEQ_LEN=4096 \
17+
WARMDOWN_ITERS=2600 WARMDOWN_SHAPE=linear MUON_EQ_R=1 \
18+
LATE_QAT_THRESHOLD=0.15 USE_GPTQ=1 QUANT_BITS=6 QUANT_BITS_EMBED=8 GPTQ_NUM_SEQS=32 \
19+
EVAL_OVERLAP=1024 USE_LZMA=1 EVAL_TEMP=0.9 \
20+
WEIGHT_DECAY=0.04 MUON_MOMENTUM=0.99 MATRIX_LR=0.025 \
21+
torchrun --nproc_per_node=8 train_mamba3_hybrid.py
22+
```
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
torch>=2.9.1
2+
triton>=3.5.0
3+
mamba-ssm>=2.3.1
4+
sentencepiece
5+
einops
6+
numpy
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"author": "mradassaad",
3+
"github_id": "mradassaad",
4+
"name": "Mamba-3 Hybrid SSM + SP8192 + Legal TTT",
5+
"blurb": "7L Mamba-3 SISO hybrid (5 SSM + 2 attn), SP8192, 25.2M params. AR GPTQ with INT8 embed + embed Hessian. Chunk score-first TTT (SGD lr=0.010). Stateful-overlap eval.",
6+
"date": "2026-04-15",
7+
"val_loss": 2.96361204,
8+
"val_bpb": 1.14730259,
9+
"bytes_total": 15930354,
10+
"bytes_code": 104754
11+
}

0 commit comments

Comments
 (0)