Skip to content

Conversation

@JamesKunstle
Copy link
Contributor

We change the model forward method to use a different loss function. Liger Kernel's FusedLinearCrossEntropy kernel doesn't materialize these logits, so our custom forward was broken for Qwen2.5, Llama3, Gemma3, etc.

This switches FusedLinearCrossEntropy for FusedCrossEntropy as the default kernel used.

We change the model forward method to use a different loss function.
Liger Kernel's `FusedLinearCrossEntropy` kernel doesn't materialize
these logits, so our custom forward was broken for Qwen2.5, Llama3, Gemma3,
etc.

This switches `FusedLinearCrossEntropy` for `FusedCrossEntropy` as the
default kernel used.

Signed-off-by: James Kunstle <[email protected]>
@mergify mergify bot added the one-approval label Apr 16, 2025
@mergify mergify bot removed the one-approval label Apr 16, 2025
Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JamesKunstle JamesKunstle merged commit 69475ba into instructlab:main Apr 16, 2025
12 of 13 checks passed
@JamesKunstle JamesKunstle deleted the jkunstle/fix-lk-nongranite branch April 16, 2025 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants