openai · NewyorkDev · May 1, 2026 · May 1, 2026
diff --git a/records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/JEPA.mp4 b/records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/JEPA.mp4
diff --git a/...n_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/JEPA_BIJEPAXLITE_3SEED_STATUS.md b/...n_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/JEPA_BIJEPAXLITE_3SEED_STATUS.md
@@ -0,0 +1,63 @@
+# BIJEPAX-lite 3-seed candidate
+
+Config matches successful seed 42 run:
+- script: our_submission/train_gpt_v15_bijepax.py
+- DISABLE_COMPILE=1
+- CASEOPS_ENABLED=1
+- PPM_MIXER_ENABLED=1 order=5 H=0.999 L=0.18 T=0.80
+- TTT_ENABLED=0
+- LQER_TOP_K=1
+- BIJEPAX_ENABLED=1 weight=0.01 start=0.35 end=0.80 fwd_hops=4 bwd_hops=4 cycle=0 head_dim=32 stride=64 lr=0.001
+
+Existing seed 42:
+- run: v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405
+- final ppm_sliding val_bpb: 0.97234287
+- artifact bytes: 15997180
+- eval time: 502131ms
+- rc: 0
+
+Queued seeds: 314, 999
+
+
+## v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209
+- started: 2026-05-01T02:52:09Z
+- log: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log
+- finished: 2026-05-01T03:14:07Z
+- rc: 0
+- scores: diagnostic pre-quantization post-ema val_loss:2.42899363 val_bpb:1.10988323 eval_time:9910ms;Total submission size quantized+pergroup: 15999539 bytes;diagnostic quantized val_loss:2.44155528 val_bpb:1.11562304 eval_time:9926ms;ppm_mixer val_bpb:0.97206308 eval_time:453715ms order=5 H=0.999 L=0.18 T=0.8 N_tokens=47851520 N_sidecar_bytes=151074499;ppm_sliding val_loss:2.45044876 val_bpb:0.97206308 eval_time:499038ms;
+
+## v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407
+- started: 2026-05-01T03:14:07Z
+- log: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log
+- finished: 2026-05-01T03:36:01Z
+- rc: 0
+- scores: diagnostic pre-quantization post-ema val_loss:2.43314506 val_bpb:1.11178015 eval_time:9911ms;Total submission size quantized+pergroup: 15997593 bytes;diagnostic quantized val_loss:2.44582432 val_bpb:1.11757370 eval_time:11393ms;ppm_mixer val_bpb:0.97373767 eval_time:451054ms order=5 H=0.999 L=0.18 T=0.8 N_tokens=47851520 N_sidecar_bytes=151074499;ppm_sliding val_loss:2.45502055 val_bpb:0.97373767 eval_time:496384ms;
+
+## Final scrape
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log:  artifact_dir: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log:  logfile: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209.txt
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log:  model_path: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/final_model.pt
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log:  quantized_model_path: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/final_model.int6.ptz
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log:  run_id: v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log:Total submission size quantized+pergroup: 15999539 bytes
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log:diagnostic quantized val_loss:2.44155528 val_bpb:1.11562304 eval_time:9926ms
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log:ppm_mixer val_bpb:0.97206308 eval_time:453715ms order=5 H=0.999 L=0.18 T=0.8 N_tokens=47851520 N_sidecar_bytes=151074499
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s314_20260501_025209/train.log:ppm_sliding val_loss:2.45044876 val_bpb:0.97206308 eval_time:499038ms
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/train.log:  artifact_dir: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/train.log:  logfile: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405.txt
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/train.log:  model_path: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/final_model.pt
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/train.log:  quantized_model_path: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/final_model.int6.ptz
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/train.log:  run_id: v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/train.log:Total submission size quantized+pergroup: 15997180 bytes
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/train.log:diagnostic quantized val_loss:2.44116551 val_bpb:1.11544494 eval_time:10342ms
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/train.log:ppm_mixer val_bpb:0.97234287 eval_time:456845ms order=5 H=0.999 L=0.18 T=0.8 N_tokens=47851520 N_sidecar_bytes=151074499
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s42_20260501_022405/train.log:ppm_sliding val_loss:2.45118426 val_bpb:0.97234287 eval_time:502131ms
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log:  artifact_dir: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log:  logfile: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407.txt
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log:  model_path: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/final_model.pt
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log:  quantized_model_path: /workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/final_model.int6.ptz
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log:  run_id: v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log:Total submission size quantized+pergroup: 15997593 bytes
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log:diagnostic quantized val_loss:2.44582432 val_bpb:1.11757370 eval_time:11393ms
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log:ppm_mixer val_bpb:0.97373767 eval_time:451054ms order=5 H=0.999 L=0.18 T=0.8 N_tokens=47851520 N_sidecar_bytes=151074499
+/workspace/parameter-golf/our_submission/1000/runs/v15_bijepaxlite_lqer1_nocompile_s999_20260501_031407/train.log:ppm_sliding val_loss:2.45502055 val_bpb:0.97373767 eval_time:496384ms
diff --git a/records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/LEGALITY_AUDIT.md b/records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/LEGALITY_AUDIT.md
@@ -0,0 +1,115 @@
+# Legality Audit - BIJEPAX-lite
+
+## Verdict
+
+Current read: **likely legal/submittable**, assuming the existing CaseOps byte-sidecar/PPM lane is accepted.
+
+The BIJEPAX-lite addition itself is low-risk because it is training-only and has no evaluation-time access to future validation tokens.
+
+## Challenge rules checked
+
+From the repository README:
+
+- Submission artifact size is code bytes plus compressed model bytes.
+- The cap is strict decimal `16,000,000` bytes.
+- Evaluation may not use training data unless paid for inside the artifact.
+- Validation data may not be used during training.
+- Evaluation must complete within 10 minutes on 8xH100, separate from the 10-minute training cap.
+- Test-time methods must score before updating on validation tokens.
+
+## Artifact size
+
+Seed 42:
+
+- `Serialized model quantized+pergroup: 15955181 bytes`
+- `Total submission size quantized+pergroup: 15997180 bytes`
+- Strict cap: `16000000 bytes`
+- Headroom: `2820 bytes`
+
+This is tight but under cap.
+
+`LQER_TOP_K=1` was used specifically to create byte headroom. Earlier BIJEPA without this trim packaged at `16,004,902` bytes and was not submittable.
+
+## Training-only JEPA auxiliary
+
+Relevant implementation:
+
+- `class MultiDirectionalBiJEPAX`
+- `def bijepax_weight_at`
+- `train_model(...): bijepax_module = MultiDirectionalBiJEPAX(...)`
+- `step_fn(...): loss = ce_loss + bijepax_module(hidden, ...)`
+
+The predictor module is created outside `base_model`:
+
+```python
+bijepax_module = MultiDirectionalBiJEPAX(...).to(device).bfloat16()
+bijepax_opt = torch.optim.Adam(bijepax_module.parameters(), ...)
+```
+
+It is not assigned as a child module of `base_model`, so `base_model.state_dict()` does not contain BIJEPAX predictor weights.
+
+Serialization only saves `base_model.state_dict()`:
+
+```python
+torch.save(base_model.state_dict(), h.model_path)
+sd_cpu = _unbank_state_dict(base_model.state_dict(), h.num_layers)
+```
+
+So the JEPA predictor heads are not present in the final artifact.
+
+## No validation leakage during training
+
+Training batches come from `DocumentPackingLoader(h, device)`.
+
+Validation data is loaded for periodic/terminal validation, but the BIJEPAX training loss only uses hidden states from training microbatches:
+
+```python
+x, y, cu_seqlens, _max_seqlen = train_loader.next_batch(...)
+ce_loss, hidden = forward_with_hidden(x, y, ...)
+loss = ce_loss + bijepax_module(hidden, ...)
+```
+
+The BIJEPAX module does not read validation tokens, validation bytes, or validation sidecars.
+
+## Evaluation path
+
+Final score uses the existing PPM sliding evaluator:
+
+- `eval_val_ppm_sliding`
+- `ppm_mixer val_bpb`
+- `ppm_sliding val_loss / val_bpb`
+
+The PPM mixer operates score-before-update over the scored target stream. The implementation computes neural log probabilities first, then the byte mixer walks bytes in order and updates its tables after scoring each byte.
+
+Legality risk is therefore concentrated in whether reviewers accept this existing PPM/CaseOps scoring lane, not in BIJEPAX-lite.
+
+## Cross-document leak check
+
+The SmearGate cross-document leak fix is present in both hidden and TTT paths:
+
+```python
+not_bos = (input_ids[:, 1:] != BOS_ID).to(x.dtype).unsqueeze(-1)
+x = torch.cat([x[:, :1], x[:, 1:] + g * x[:, :-1] * not_bos], dim=1)
+```
+
+TTT is disabled for this candidate (`TTT_ENABLED=0`), but the symmetric fix is still present.
+
+## Eval compile
+
+The run uses `DISABLE_COMPILE=1`. Post-serialize evaluation also honors this:
+
+```python
+if os.environ.get("DISABLE_COMPILE", "0") == "1":
+    log("eval_compile:disabled_by_env")
+    compiled_model = eval_model
+    compiled_forward_logits = eval_model.forward_logits
+```
+
+This avoids the compile stall encountered in the first BIJEPAX attempts.
+
+## Risks / reviewer-facing caveats
+
+- The artifact headroom is only `2820` bytes on seed 42. Do not add substantial code unless compression is rechecked.
+- The PR should avoid unverifiable claims such as "BiJEPA proved 4x better on chaotic systems" unless the exact source is provided.
+- The submission should clearly say the JEPA module is an auxiliary training regularizer, not an eval-time bidirectional predictor.
+- If the competition reviewers consider the PPM/CaseOps byte-sidecar lane non-compliant, this candidate inherits that risk.
diff --git a/records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/PR_BODY.md b/records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/PR_BODY.md
@@ -0,0 +1,69 @@
+# BIJEPAX-lite JEPA + SP8192 CaseOps PPM
+
+This record submits a Claude-designed, JEPA-inspired training-only auxiliary regularizer on top of the SP8192 CaseOps + per-group compression + PPM sliding stack.
+
+The final 3-seed mean is:
+
+```text
+ppm_sliding val_bpb: 0.97271454
+```
+
+## Results
+
+| Seed | Final `ppm_sliding val_bpb` | Quantized diagnostic | Artifact bytes | Train stop | Eval time | Exit |
+|---:|---:|---:|---:|---:|---:|---:|
+| 42 | `0.97234287` | `1.11544494` | `15,997,180` | `2014` steps / `599.843s` | `502.131s` | `0` |
+| 314 | `0.97206308` | `1.11562304` | `15,999,539` | `2012` steps / `599.586s` | `499.038s` | `0` |
+| 999 | `0.97373767` | `1.11757370` | `15,997,593` | `2013` steps / `599.821s` | `496.384s` | `0` |
+
+Three-seed sample std: `0.00089703`.
+
+All three runs are under:
+
+- strict decimal `16,000,000` byte artifact cap
+- 600s training cap
+- 600s evaluation cap
+
+## What is new
+
+BIJEPAX-lite adds a small custom JEPA-style hidden-state prediction objective during training:
+
+- hop-4 forward hidden-state prediction
+- hop-4 backward hidden-state prediction
+- cosine embedding-space loss
+- LayerNorm-stabilized predictor heads
+- no cycle head in the submitted lightweight config
+- active only from `35%` to `80%` of the wallclock schedule
+- separate optimizer and separate module from the base GPT
+
+The predictor heads are **not serialized**. Final scoring is performed by the quantized base model with the existing causal PPM sliding evaluator.
+
+## Compliance notes
+
+- `TTT_ENABLED=0`
+- `LQER_TOP_K=1` keeps all seeds below the strict byte cap
+- SmearGate BOS masking is present for packed-document cross-boundary safety
+- BIJEPAX-lite trains only on training batches from `DocumentPackingLoader`
+- BIJEPAX-lite does not access validation tokens or validation byte sidecars during training
+- Final score is from `ppm_sliding`
+
+The folder includes:
+
+- `train_gpt.py`
+- three seed logs
+- full source/log captures for each seed
+- `submission.json`
+- `LEGALITY_AUDIT.md`
+- `STATIC_AUDIT_NOTES.md`
+- `REFERENCES.md`
+- `JEPA.mp4` as a short visual/demo asset
+
+## Acknowledgements
+
+Thanks to Claude for designing the custom BIJEPAX-lite auxiliary objective and helping turn the JEPA idea into a runnable candidate. Thanks to Codex for implementing the run path, auditing legality, coordinating the 3-seed package, and assembling this PR. Thanks also to the Parameter Golf community for the public ideas and fast iteration that this stack builds on.
+
+## Validation
+
+- `python3 -m py_compile records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/train_gpt.py`
+- `python3 -m json.tool records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/submission.json`
+- 3 full remote runs on 8xH100 completed with `rc=0`