karpathy/llama2.c

Missing Sampler when running on multiple GPUs using DDP

banyan-god opened this issue · 3 comments

Based on my understanding we need a sampler that will make sure a batch is not assigned to other GPU so that we dont duplicate the training of the same batch. But I dont see that we use any sampler while using DDP is this a bug or am i missing something ?

https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler

dataloader.h is multi-process aware, it iterates in chunks of n_process * B * T and manages the pointers into this memory per process, so it gives different batches to each process.

@karpathy thanks for getting back on this but as this is llama2.c project we dont have/use dataloader.h in here ?

Ok i think this helps to make sure they are unique

        worker_info = torch.utils.data.get_worker_info()
        worker_id = worker_info.id if worker_info else 0
        # get DDP rank info
        rank = dist.get_rank() if dist.is_initialized() else 0
        # combine the worker_id and worker_rank to create a unique seed for rng
        seed = 42 + worker_id + 1337 * rank
        rng = random.Random(seed)