Dataloader and Sampler for 3D Parallelism
xrsrke opened this issue · 0 comments
xrsrke commented
When we train a model with pipeline parallelism, different stages require different data, some stages even do not load data. So we try to make the different stages only get their needed data, without loading the full dataset.
And turn a regular pytorch dataloader to a distributed dataloader
Reading