Skip to content

Add novel experiment features: z-loss, token dropout, embedding mixup…

1a6be00
Select commit
Loading
Failed to load commit list.
Open

Non-record: Cosine LR Schedule — -0.070 BPB improvement + Focal Loss Investigation (corrected) #1380

Add novel experiment features: z-loss, token dropout, embedding mixup…
1a6be00
Select commit
Loading
Failed to load commit list.

Workflow runs completed with no jobs