NVlabs/few-shot-vid2vid

Problem when sequence length becomes 8

Closed this issue · 2 comments

When the sequence length is updated to 8 during training, at the first epoch (at a random iteration, not the first one) , my model always crashes. The error is:

File "train.py", line 53, in train
   for idx, data in enumerate(dataset, start=trainer.epoch_iter):
 File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
   return self._process_data(data)
 File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
   data.reraise()
 File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise
   raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
 File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
   data = fetcher.fetch(index)
 File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
   return self.collate_fn(data)
 File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 77, in <dictcomp>
   return {key: default_collate([d[key] for d in batch]) for key in elem}
 File "/home/chatziko/PycharmProjects/venv/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 58, in default_collate
   return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 8 and 7 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:689.

I use batch size of 2 and two GPUs.. I print the batch size and it is always of size [2,8,c,h,w]. Has anybody encountered the same error?

Same problem here, would be grateful if anyone could share a solution!

Same problem here, would be grateful if anyone could share a solution!

I found the problem. I had forgoten to delete videos in my dataset with <8 frames, that caused your problem. so check your dataset carefully :P