Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
944d8bc
2026_03-27_PhasecoherenceGatedGradients
jzgdev Mar 27, 2026
1d86b16
replace original train_gpt.py
jzgdev Mar 27, 2026
974a4be
updates
jzgdev Mar 27, 2026
b801d0b
update
jzgdev Mar 27, 2026
e335f2c
add results from 1xH100
jzgdev Mar 27, 2026
0fee70d
Merge branch 'main' of https://github.com/jzgdev/parameter-golf
jzgdev Mar 27, 2026
4fb40b3
2026-03-27_PhaseCoherenceGatedGradients
jzgdev Mar 27, 2026
655b620
Merge remote-tracking branch 'refs/remotes/origin/main'
jzgdev Mar 27, 2026
59d5734
submit 2026-03-27_PhaseCoherenceGatedGradients
jzgdev Mar 27, 2026
c9d0c98
submit 2026-03-27_PhaseCoherenceGatedGradients
jzgdev Mar 27, 2026
e1461cc
2026_03-27_PhasecoherenceGatedGradients submission
jzgdev Mar 27, 2026
deb8879
2026_03-27_PhasecoherenceGatedGradients submission
jzgdev Mar 27, 2026
400bec3
2026_03-27_PhasecoherenceGatedGradients submission
jzgdev Mar 27, 2026
28bbfe7
2026_03-27_PhasecoherenceGatedGradients submission
jzgdev Mar 27, 2026
94f03ae
submission 2026-03-27_PhaseCoherencegatedGradientsA
Mar 27, 2026
3d29b32
2026_03-27_PhasecoherenceGatedGradients submission
jzgdev Mar 27, 2026
1707e27
2026_03-27_PhasecoherenceGatedGradients submission
jzgdev Mar 27, 2026
e632f56
Add optional Dirichlet ngram cache eval to PIC-GD record
jzgdev Mar 27, 2026
cc5cda4
submission 2026-03-27_PhaseCoherenceGatedGradients PIC-GID + Parallel…
jzgdev Mar 27, 2026
2118702
HologrAdam v0.1
jzgdev Mar 29, 2026
c14f57c
Merge branch 'main' of https://github.com/jzgdev/parameter-golf
jzgdev Mar 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ data/manifest.json
data/docs_selected.jsonl
.mypy_cache/
.venv
logs/
logs/
2 changes: 1 addition & 1 deletion data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,4 @@ MATCHED_FINEWEB_TIKTOKEN_THREADS=16
MATCHED_FINEWEB_GPT2_DECODE_BATCH_SIZE=512
```

These control batched tokenizer encoding during shard export, tokenizer thread count, tiktoken thread count, and batched GPT-2 decode for the blobstore docs-cache path.
These control batched tokenizer encoding during shard export, tokenizer thread count, tiktoken thread count, and batched GPT-2 decode for the blobstore docs-cache path.
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Phase Coherence Gated Gradients

Exploratory PIC-GD experiment folder for `2026-03-27`. This is not a leaderboard submission package yet.

The script in this folder adapts the baseline training loop to a batch-level version of phase-induced coherence-gated gradient descent (PIC-GD) while keeping:

- the real-valued transformer architecture
- the Muon + Adam optimizer split
- tokenizer-agnostic `val_bpb` evaluation
- the int8 + zlib roundtrip export path

## Current Status

This folder should be treated as an experiment source of truth, not as a finished submission.

- current reported result: single exploratory `8x H100` run around `1.3178 val_bpb`
- not competitive with the current `track_10min_16mb` leaderboard
- `submission.json` remains intentionally unbenchmarked
- generated artifacts like `final_model.pt` and `final_model.int8.ptz` should not be part of the eventual PR

## PIC-GD Adaptation

The implementation stays close to the baseline training loop:

- final hidden states are treated as pseudo-complex latents by pairing adjacent channels as `(real, imag)`
- target-token embeddings are paired the same way to provide a reference signal
- a normalized coherence score is computed from the paired latent/reference dot product
- the coherence score is converted into a detached gradient gate

```python
alpha = PICGD_MIN_GATE + (1 - PICGD_MIN_GATE) * sigmoid(PICGD_BETA * coherence)
```

Training backpropagates `loss * alpha`, while validation and final quantized roundtrip evaluation continue to use raw cross-entropy only.

## Current Experimental Defaults

- `PICGD_ENABLED=1`
- `PICGD_BETA=2.0`
- `PICGD_MIN_GATE=0.05`
- `PICGD_EPS=1e-6`
- `PICGD_TOKEN_STRIDE=32`
- `attention_impl` is logged as `native_gqa`, `kv_repeat_fallback`, or `standard_sdpa`

Training logs include:

- `picgd_coherence`
- `picgd_gate`
- `attention_impl`

## Evidence Standard Before Packaging

Do not rewrite this folder as a real submission until the following exists on `8x H100`:

- 1 baseline run with root `train_gpt.py`
- 3 PIC-GD runs with this folder's `train_gpt.py`
- recorded seeds, `step_avg`, final quantized `val_bpb`, artifact size, and peak memory for every run
- a positive baseline-vs-PIC-GD comparison that justifies keeping the method

If PIC-GD does not beat the baseline mean on the same setup, stop pursuing it for submission.
Loading