batch size reduction
Baldins opened this issue · 9 comments
Hi,
how can I not reduce the size of the batch when I use nonechucks?
Thanks in advance!
Sorry, I didn't get you. Are you not able to change the batch_size
in SafeDataLoader
? Could you rephrase your question?
So what happens to me is that the batch size is reduced when I use SafeDataloader (e.g. I have a batch size of 32 and it turns out that some of the batches are of size 31 or 30 or less depending on how many corrupted images I have)
Is the batch with the smaller size just the last batch?
no - there are several like that - isn't this supposed to happen?
it is actually random :/
No, this is definitely not supposed to happen. Could you maybe share a code snippet of the important bits?
This is what I get at the end: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 32 and 31 in dimension 0 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83
First I am defining my own Dataset class and then I am doing the following:
import nonechucks as nc
dataset_s = NewDataset(csv_file='file.csv',
imu_data=new_imu_values,
root_dir_depth='depth/',
root_dir_rgb='RGB/')
dataset = nc.SafeDataset(dataset_s)
dataloader = nc.SafeDataLoader(dataset, batch_size=32,
shuffle=False, num_workers=15)
batch_size = 32
validation_split = .2
shuffle_dataset = False
random_seed= 42
dataset_size = len(dataset)
indices = list(range(dataset_size))
split = 576
if shuffle_dataset :
np.random.seed(random_seed)
np.random.shuffle(indices)
train_indices, val_indices = indices[:-split], indices[-split:2944]
train_sampler = SubsetRandomSampler(train_indices)
valid_sampler = SubsetRandomSampler(val_indices)
train_loader = nc.SafeDataLoader(dataset, batch_size=batch_size,
sampler=train_sampler, num_workers=15)
validation_loader = nc.SafeDataLoader(dataset, batch_size=batch_size,
sampler=valid_sampler, num_workers=15)
Am I doing something wrong?
I got the error - it was here:
train_loader = nc.SafeDataLoader(dataset, batch_size=batch_size,
sampler=train_sampler, num_workers=15)
validation_loader = nc.SafeDataLoader(dataset, batch_size=batch_size,
sampler=valid_sampler, num_workers=15)
I have to use just DataLoader instead of nc.SafeDataLoader
Sorry about that and thank you for the replies! :)
Yeah, I think it's because you're specifying specific indices, so if those indices contain invalid samples, it's just going to be dropped. I suggest you check out SafeSampler
for this.