Skip to content

Non-record: Trinity Ternary CPU v3 — Apple M1 Pro 72h, val_bpb 1.5042#1866

Open
deborahnelson8788726 wants to merge 1 commit intoopenai:mainfrom
deborahnelson8788726:trinity-ternary-cpu
Open

Non-record: Trinity Ternary CPU v3 — Apple M1 Pro 72h, val_bpb 1.5042#1866
deborahnelson8788726 wants to merge 1 commit intoopenai:mainfrom
deborahnelson8788726:trinity-ternary-cpu

Conversation

@deborahnelson8788726
Copy link
Copy Markdown

@deborahnelson8788726 deborahnelson8788726 commented Apr 27, 2026

Non-record: Trinity Ternary CPU v3 on Apple M1 Pro

val_bpb: 1.5042 - single seed, 24M parameter model trained for 72.04 hours on Apple M1 Pro CPU only (10 cores, 16GB RAM, no GPU/MPS/NPU).

This is intentionally a non-record / unlimited-compute / notable submission. It is not a main 10-minute leaderboard claim because training used a laptop CPU for 72 hours rather than 8xH100 for 600 seconds.

Scope after cleanup

This PR now contains one submission folder only:

records/track_non_record_16mb/2026-04-24_Trinity_Ternary_CPU_v2/

Result summary

Metric Value
val_bpb 1.5042
val_loss 2.5479
tokens/byte (SP1024) 0.4092
artifact size 5,525,048 bytes (5.53 MB decimal)
training time 72.04h on M1 Pro CPU
total params 24,128,000
ternary params 23,592,960
final ternary blend alpha=1.0 full ternary

Method

  • 10L 512d 8-head transformer, MLP 2.5x, RoPE, RMSNorm, tied 1024-vocab embeddings.
  • BitNet b1.58-style ternary QAT with STE and full alpha=1.0 ternary weights.
  • Trinity base-3 packing: 5 balanced trits per byte, lossless, 99% of the log2(3) theoretical optimum.
  • Step-based ternary ramp plus cosine LR decay, so macOS sleep cannot advance the quantization schedule while training is paused.
  • CPU-only training path for Apple Silicon / commodity laptop reproducibility.

Compliance

Track A style evaluation: causal attention, normalized full-vocab softmax, single left-to-right pass, no SLOT, no n-gram cache, no pre-quant TTT, no eval-time adaptation.

Reproduction notes

The packed submission artifact is included as final_model_v3.trinity.ptz. The reported v3 training run warm-started from a prior v1 CPU checkpoint that is not included in this PR; exact reruns should set WARM_START_PATH=/path/to/final_model.pt. Without that variable, train_gpt.py runs the same configuration from scratch.

python3 data/cached_challenge_fineweb.py --variant sp1024
WARM_START_PATH=/path/to/final_model.pt caffeinate -i -m -s python3 records/track_non_record_16mb/2026-04-24_Trinity_Ternary_CPU_v2/train_gpt.py
python3 records/track_non_record_16mb/2026-04-24_Trinity_Ternary_CPU_v2/pack_and_eval_v3.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant