Problem when sequence length becomes 8
Closed this issue · 2 comments
chatzikon commented
When the sequence length is updated to 8 during training, at the first epoch (at a random iteration, not the first one) , my model always crashes. The error is:
File "train.py", line 53, in train
for idx, data in enumerate(dataset, start=trainer.epoch_iter):
File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
return self._process_data(data)
File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 77, in <dictcomp>
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 58, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 8 and 7 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:689.
I use batch size of 2 and two GPUs.. I print the batch size and it is always of size [2,8,c,h,w]. Has anybody encountered the same error?
gabewilliam commented
Same problem here, would be grateful if anyone could share a solution!
chatzikon commented
Same problem here, would be grateful if anyone could share a solution!
I found the problem. I had forgoten to delete videos in my dataset with <8 frames, that caused your problem. so check your dataset carefully :P