Missing Sampler when running on multiple GPUs using DDP
banyan-god opened this issue · 3 comments
banyan-god commented
Based on my understanding we need a sampler that will make sure a batch is not assigned to other GPU so that we dont duplicate the training of the same batch. But I dont see that we use any sampler while using DDP is this a bug or am i missing something ?
https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler
karpathy commented
dataloader.h is multi-process aware, it iterates in chunks of n_process * B * T and manages the pointers into this memory per process, so it gives different batches to each process.
banyan-god commented
@karpathy thanks for getting back on this but as this is llama2.c project we dont have/use dataloader.h in here ?
banyan-god commented
Ok i think this helps to make sure they are unique
worker_info = torch.utils.data.get_worker_info()
worker_id = worker_info.id if worker_info else 0
# get DDP rank info
rank = dist.get_rank() if dist.is_initialized() else 0
# combine the worker_id and worker_rank to create a unique seed for rng
seed = 42 + worker_id + 1337 * rank
rng = random.Random(seed)