What is the purpose of SmearGate cross-document 'leak fix'?

Why are PRs, such as the current record https://github.com/openai/parameter-golf/pull/1855, doing this?

Looking at prior documents does not break causality. Any LLM that doesn't use intra-document masking is already looking at prior documents through attention. There is no 'cheating' involved here, unless the maintainers have created an arbitrary ruling on this. When all of these 1 position techniques like smear gate and bigram hash were created, both the masked and unmasked versions were tested, and the unmasked version was intentionally selected because it ran faster, didn't hurt loss, and obeyed the causal mask.

I am concerned that every record is going to copy paste this, and the final record is going to have this janky inefficiency for no good reason.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the purpose of SmearGate cross-document 'leak fix'? #1988

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

What is the purpose of SmearGate cross-document 'leak fix'? #1988

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions