Skip to content

New record submission for review (#1855) — val_bpb 1.06108 #1856

@codemath3000

Description

@codemath3000

@valerio-oai @0hq @cocohearts @openai/parameter-golf-team Hi again! Submitted a new record PR with a substantial improvement over the current 1.0810 BPB SOTA:

Built on PR #1797's base, with two technical contributions on top:

  1. SmearGate cross-document BOS leak fix — masks the prev-token term wherever the current token is BOS, so packed-stream eval no longer leaks doc N's last token into doc N+1's BOS embedding.
  2. Per-group compression pipeline — adds COMPRESSOR=pergroup (lrzip ZPAQ + L1 row similarity-sort on hot tensors + brotli remainder), ~280 KB smaller artifact.

Plus a stack of 9 greedy-validated hyperparameter overrides (full table in the PR).

Happy to address any concerns — thanks again for taking the time to review!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions