msamogh/nonechucks

batch size reduction

Baldins opened this issue · 9 comments

Hi,

how can I not reduce the size of the batch when I use nonechucks?

Thanks in advance!

Sorry, I didn't get you. Are you not able to change the batch_size in SafeDataLoader? Could you rephrase your question?

So what happens to me is that the batch size is reduced when I use SafeDataloader (e.g. I have a batch size of 32 and it turns out that some of the batches are of size 31 or 30 or less depending on how many corrupted images I have)

Is the batch with the smaller size just the last batch?

no - there are several like that - isn't this supposed to happen?

it is actually random :/

No, this is definitely not supposed to happen. Could you maybe share a code snippet of the important bits?

This is what I get at the end: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 32 and 31 in dimension 0 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83

First I am defining my own Dataset class and then I am doing the following:

import nonechucks as nc

dataset_s = NewDataset(csv_file='file.csv',
imu_data=new_imu_values,
root_dir_depth='depth/',
root_dir_rgb='RGB/')

dataset = nc.SafeDataset(dataset_s)
dataloader = nc.SafeDataLoader(dataset, batch_size=32,
shuffle=False, num_workers=15)

batch_size = 32
validation_split = .2
shuffle_dataset = False
random_seed= 42
dataset_size = len(dataset)
indices = list(range(dataset_size))
split = 576
if shuffle_dataset :
np.random.seed(random_seed)
np.random.shuffle(indices)
train_indices, val_indices = indices[:-split], indices[-split:2944]

train_sampler = SubsetRandomSampler(train_indices)
valid_sampler = SubsetRandomSampler(val_indices)

train_loader = nc.SafeDataLoader(dataset, batch_size=batch_size,
sampler=train_sampler, num_workers=15)
validation_loader = nc.SafeDataLoader(dataset, batch_size=batch_size,
sampler=valid_sampler, num_workers=15)

Am I doing something wrong?

I got the error - it was here:

train_loader = nc.SafeDataLoader(dataset, batch_size=batch_size,
sampler=train_sampler, num_workers=15)
validation_loader = nc.SafeDataLoader(dataset, batch_size=batch_size,
sampler=valid_sampler, num_workers=15)

I have to use just DataLoader instead of nc.SafeDataLoader

Sorry about that and thank you for the replies! :)

Yeah, I think it's because you're specifying specific indices, so if those indices contain invalid samples, it's just going to be dropped. I suggest you check out SafeSampler for this.