Skip to content

Commit bd7eb95

Browse files
committed
exp82: 80 shards (10B tokens) + order-13 packed n-gram
exp81c proved paradigm: 0.1518 BPB with 40 shards order-9. Extend to full 80 shards (10B tokens) + order 2-13 for richer cache. Expected: sub-0.12 (closing gap to openai#900 at 0.1197).
1 parent 838ad4f commit bd7eb95

2 files changed

Lines changed: 3 additions & 2 deletions

File tree

results.tsv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,4 @@ fc5f627 0.2417 15.39 keep flat Dirichlet c=1.0 + phrase[36,28,20,16] NEW BEST! p
5151
e608af8 0.2307 15.32 discard order-13 flat Dirichlet + phrase[36,28,20,16] -0.011 from orders but eval=673s OVER BUDGET
5252
f5c8cde 0.2284 14.92 discard stride=64 order-13 phrase[48,36,28,20,16] NEW BEST BPB but eval=601s (1s over budget)
5353
c9c53a6 0.2285 15.33 keep stride=72 order-13 phrase[48,36,28,20,16] LEGAL BEST! eval=567s, 33s spare
54+
838ad4f 0.1518 13.43 keep PACKED NGRAM ARTIFACT 2L/128d + 40 shards order-9 PARADIGM SHIFT! eval=372s, 100% hit

train_gpt.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2032,9 +2032,9 @@ def lr_mul(step: int, elapsed_ms: float) -> float:
20322032
packed_ngram = None
20332033
if ngram_artifact_enabled:
20342034
t_build = time.perf_counter()
2035-
ngram_art_order = int(os.environ.get("NGRAM_ART_ORDER", "9"))
2035+
ngram_art_order = int(os.environ.get("NGRAM_ART_ORDER", "13"))
20362036
ngram_art_buckets = int(os.environ.get("NGRAM_ART_BUCKETS", "524288"))
2037-
ngram_art_max_shards = int(os.environ.get("NGRAM_ART_MAX_SHARDS", "40"))
2037+
ngram_art_max_shards = int(os.environ.get("NGRAM_ART_MAX_SHARDS", "80"))
20382038
# each rank builds from a subset of shards
20392039
all_shards = sorted(glob.glob(os.path.join(args.data_path, "fineweb_train_*.bin")))
20402040
if ngram_art_max_shards > 0:

0 commit comments

Comments
 (0)