Why is iter used here during training iterations instead of directly using the dataloader?
Closed this issue · 8 comments
Line 114 in cea7de2
dataloader = commons.InfiniteDataLoader(groups[current_group_num], num_workers=args.num_workers,
batch_size=args.batch_size, shuffle=True,
pin_memory=(args.device == "cuda"), drop_last=True)
dataloader_iterator = iter(dataloader)
model = model.train()
epoch_losses = np.zeros((0, 1), dtype=np.float32)
for iteration in tqdm(range(args.iterations_per_epoch), ncols=100):
images, targets, _ = next(dataloader_iterator)
Why not do it directly like this?
dataloader = commons.InfiniteDataLoader(groups[current_group_num], num_workers=args.num_workers,
batch_size=args.batch_size, shuffle=True,
pin_memory=(args.device == "cuda"), drop_last=True)
model = model.train()
epoch_losses = np.zeros((0, 1), dtype=np.float32)
for images, targets, _ in tqdm(dataloader, ncols=100):
You can use the dataloader, but then it would continue until it runs out of data (or you would need to use a break after N iterations). Using an iterator is more dataset agnostic, and it is clearer how long the "epoch" is going to last (and using a full epoch over the whole dataset could take a day, so that is not an option).
Anyway, doing this
for iteration, (images, targets, _) in tqdm(enumerate(dataloader), ncols=100, total=args.iterations_per_epoch):
if iteration >= args.iterations_per_epoch:
break
is equivalent to this
dataloader_iterator = iter(dataloader)
for iteration in tqdm(range(args.iterations_per_epoch), ncols=100):
images, targets, _ = next(dataloader_iterator)
I just find the second example cleaner than the first
It is also okay to use the standard DataLoader
class, but using an InfiniteDataLoader
ensures that if there are not enough samples the __iter__
will not raise a StopIteration
. This could happen when changing the parameters within the code (e.g. using higher values for parameters --iterations_per_epoch
, --batch_size
, --L
, --N
might lead to this issue when using a standard DataLoader
)
dataloader_iterator = iter(dataloader)
for iteration in tqdm(range(args.iterations_per_epoch), ncols=100):
images, targets, _ = next(dataloader_iterator)
What you mean is that with this approach, you can train on a portion of the data in each epoch without having to train on all the data, and only need to go through args.iterations_per_epoch
iterations?
Line 106 in 543eaa5
dataloader = commons.InfiniteDataLoader(groups[current_group_num], num_workers=args.num_workers,
batch_size=args.batch_size, shuffle=True,
pin_memory=(args.device == "cuda"), drop_last=True)
dataloader_iterator = iter(dataloader)
When I use the small
dataset, it has only one group containing 5,965 classes. If args.batch_size
is set to 8, there will be 745 BatchSamplers
in the dataloader, meaning the dataloader can iterate 745 times to train on all the data. The statement "images, targets, _ = next(dataloader_iterator)
" can only iterate 745 times as well. However, in the training loop, args.iterations_per_epoch
is set to 10,000, indicating 10,000 iterations are expected, but in reality, only 745 iterations are possible because the statement "images, targets, _ = next(dataloader_iterator)"
cannot iterate further after 745 iterations. If this is the case, your intention to train on only a portion of data in each epoch may not be achieved.
class InfiniteDataLoader(torch.utils.data.DataLoader):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dataset_iterator = super().__iter__()
def __iter__(self):
return self
def __next__(self):
try:
batch = next(self.dataset_iterator)
except StopIteration:
self.dataset_iterator = super().__iter__()
batch = next(self.dataset_iterator)
return batch
using an InfiniteDataLoader ensures that if there are not enough samples the iter will not raise a StopIteration.
When all samples are used up, the __iter__
will start iterating from the first batch again. Is that correct?
When args.iterations_per_epoch=10,000
and the DataLoader
has 800
BatchSamplers
, after completing the iteration of the 800th batch, the statement "images, targets, _ = next(dataloader_iterator)"
in the for loop will start iterating from the first batch again. In this case, a single for training loop will train on the data multiple times, and eventually, one epoch will complete 10,000 iterations of training.
When args.iterations_per_epoch=10,000
and the DataLoader
has 20,000 BatchSamplers
, in this case, a single for training loop will only perform 10,000 iterations of training, training only a portion of the data.
This ensures that regardless of whether the number of BatchSamplers is greater than args.iterations_per_epoch, the for training loop in one epoch will always iterate args.iterations_per_epoch times.
Right?
Based on this setup, training on 8 groups
of the processed
dataset can lead to CosPlace achieving state-of-the-art performance?
All your assumptions are correct, and the answer to all your questions is yes.
Think you for your help