Missing Sampler when running on multiple GPUs using DDP

Question

Missing Sampler when running on multiple GPUs using DDP

banyan-god opened this issue 6 months ago · 3 comments

Based on my understanding we need a sampler that will make sure a batch is not assigned to other GPU so that we dont duplicate the training of the same batch. But I dont see that we use any sampler while using DDP is this a bug or am i missing something ?

https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler

Answer 1 · 2024-05-29T16:03:51.000Z

dataloader.h is multi-process aware, it iterates in chunks of n_process * B * T and manages the pointers into this memory per process, so it gives different batches to each process.

Answer 2 · 2024-05-30T15:22:14.000Z

@karpathy thanks for getting back on this but as this is llama2.c project we dont have/use dataloader.h in here ?

Answer 3 · 2024-05-30T15:26:43.000Z

Ok i think this helps to make sure they are unique

        worker_info = torch.utils.data.get_worker_info()
        worker_id = worker_info.id if worker_info else 0
        # get DDP rank info
        rank = dist.get_rank() if dist.is_initialized() else 0
        # combine the worker_id and worker_rank to create a unique seed for rng
        seed = 42 + worker_id + 1337 * rank
        rng = random.Random(seed)