-
Notifications
You must be signed in to change notification settings - Fork 126
ddp loader #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
ddp loader #352
Conversation
|
@Sopel97 do you think this would work regarding the changes for the multi gpu data loader ? |
8839576 to
fe4f09a
Compare
|
before the force push: |
|
fixed |
Sopel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, though were thinking this approach has 2 disadvantages
- It requires a lot of seeking at the start
- The data being read may eventually converge (in the worst case) to be identical for all ranks
@vondele suggested on discord
[22:39]vondele: i mean, the ith process of N could restrict itself to reading only from chunk k % N == i ..
which fixes both issues. The only one that comes up, implementation-wise is when the number of chunks in the file is lower than number of ranks. Could either just return as a failure in this case, or only reset the index once even cycle to allow it to grow past the number of chunks in the file (gives some chunk, we don't care really).
The code as is could be easily adapted. Skip rank chunks at the start, skip world_size chunks every chunk read. Reinitialize on end when cyclic.
|
like this? |
|
segfaults for me? |
Sopel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic looks like it always skips N chunks, so if the number of chunks in a file is not divisible by N it will read a different set of chunks on the second cycle, which is undesirable.
Simplest solution I see would be to remove cycle handling from skipChunks, and instead check in fetchNextChunkIfNeeded if we reached the end of file, and if we did reinitialize from the start with skipping rank chunks, but at most once to prevent infinite loop.
20b377a to
9b1979e
Compare
but we are using this m_cyclic with true ? so we can just seek to start, skip chunks and be done with it, not sure which at most once logic you are refering to? |
|
segfaults are gone in my tests. |
|
pretty sure as it is now it will not cycle the dataset the second time because
Not an issue with this implementation. Would be an issue if there was a loop skipping and resetting to start until we got to an actual rankth chunk |
|
Yeah it's an artifact just wanted to clear things up |
|
I think the initialization trick with the environment variables is not yet really working, at least locally I get this: So rank 0 thinks the world size is 1 at the time of creating the stream? Edit: must say I'm not 100% if the PL autodetection works well here, so results from other systems would be good. |
|
so, now finished training 2 stages with this branch, and I think this issue shows in the result: Reading a bit, torch.distributed is not initialized yet at the point where we create the data loader (only during train.fit), so that skipping factor initialization should probably be done lazily since at the time of creation of the data loader it is not yet known that we're in a distributed setup, the env variables are a bit of a hack. We should probably base it on: |

No description provided.