Skip to content

Conversation

@Maxusmusti
Copy link
Collaborator

Accelerate requires mixed_precision to be passed directly in config, else the fp32 upcast gets skipped and training loses overall precision, resulting in notably worse performance. Passing this also makes our manual mixed precision policy redundant, and since we are now actually using mixed precision, hybrid shard finally becomes the new default sharding strategy to keep memory expectations in parity.

@Maxusmusti Maxusmusti self-assigned this Apr 15, 2025
@mergify mergify bot added the ci-failure label Apr 15, 2025
@mergify mergify bot added the one-approval label Apr 16, 2025
Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and confirmed that this PR produces the same loss as DeepSpeed

image

Signed-off-by: Mustafa Eyceoz <[email protected]>
@Maxusmusti Maxusmusti marked this pull request as ready for review April 16, 2025 17:42
Copy link
Contributor

@JamesKunstle JamesKunstle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@JamesKunstle JamesKunstle merged commit 9948a1f into instructlab:main Apr 16, 2025
11 of 13 checks passed
@mergify mergify bot removed the one-approval label Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants