Skip to content

[feat]: support dataloader resume by skip_first_batches#416

Open
wuxibin89 wants to merge 1 commit intoPKU-YuanGroup:mainfrom
wuxibin89:feat/resume_dataloader
Open

[feat]: support dataloader resume by skip_first_batches#416
wuxibin89 wants to merge 1 commit intoPKU-YuanGroup:mainfrom
wuxibin89:feat/resume_dataloader

Conversation

@wuxibin89
Copy link

@wuxibin89 wuxibin89 commented Aug 29, 2024

What does this PR do?

This PR resume dataloader by skipping batches that have been consumed by last training epoch. For large dataset, the training time for one epoch is very long, and the train process may crash in the middle of the epoch. Without this PR, every time we resume, training starts from the beginning of the datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant