Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions records/track_10min_16mb/2026-04-07_Midnight_12L_8xH100/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
## Midnight 12L

Midnight 12L is a 12-layer Rascal II submission that uses mixed-int quantization plus Brotli
packing to add one extra transformer layer while staying under the 16,000,000-byte artifact cap.

## Architecture summary

- Backbone: 12-layer Rascal II decoder
- Attention: GQA (`num_heads=8`, `num_kv_heads=4`)
- Context features: Bigram hash 2048, RoPE dims 16, XSA on last 11 layers
- Quantization: `attn=int5`, `mlp=int6`, `aux=int6`, `embed=int8`, `other=int8`
- Compression: mixed-int checkpoint + Brotli
- Hardware: 8xH100 SXM
- Train wallclock: 600s
- `bytes_code`: 124,698

## 3-seed results

| Seed | val_bpb_exact (sliding window) | Steps | Train time (s) | bytes_total |
|------|--------------------------------|------:|---------------:|------------:|
| 444 | 1.10567949 | 6160 | 600 | 15631603 |
| 300 | 1.10582448 | 6154 | 600 | 15624171 |
| 42 | 1.10641160 | 6153 | 600 | 15619003 |
| **mean** | **1.10597186** | | | |
| **std (population)** | **0.00031653** | | | |
| **max bytes_total** | | | | **15631603** |

## Technique description

Compared to the prior 11-layer stack, this run spends compression headroom on depth:
the model is extended to 12 layers while preserving submission legality through mixed-int
quantization and Brotli artifact compression. Training and scoring remain standard score-first
evaluation, with no validation-set leakage.

## Reproduce

```bash
SKIP_GPTQ=1 SEED=444 torchrun --standalone --nproc_per_node=8 \
records/track_10min_16mb/2026-04-07_Midnight_12L_8xH100/train_gpt.py
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"author": "Frosty40",
"github_id": "newjordan",
"name": "Midnight 12L",
"blurb": "12-layer Rascal with mixed-int quantization (attn=int5, mlp=int6, embed=int8) and Brotli compression, adding depth via size headroom from compression gains",
"date": "2026-04-07T00:00:00Z",
"seed_444": {
"val_bpb": 1.1057,
"val_bpb_exact": 1.10567949,
"steps": 6160,
"train_time_s": 600,
"bytes_total": 15631603
},
"seed_300": {
"val_bpb": 1.1058,
"val_bpb_exact": 1.10582448,
"steps": 6154,
"train_time_s": 600,
"bytes_total": 15624171
},
"seed_42": {
"val_bpb": 1.1064,
"val_bpb_exact": 1.10641160,
"steps": 6153,
"train_time_s": 600,
"bytes_total": 15619003
},
"val_bpb": 1.1060,
"bytes_total": 15631603,
"bytes_code": 124698,
"hardware": "8xH100 SXM"
}
Loading