airsplay/vokenization

RuntimeError: stack expects each tensor to be equal size, but got [14] at entry 0 and [12] at entry 1

zhanhl316 opened this issue · 3 comments

Training of Epoch 0: GPU 0 will process 591616 data in 2311 iterations.
0%| | 0/2311 [00:31<?, ?it/s]
Traceback (most recent call last):
File "xmatching/main.py", line 313, in
main()
File "xmatching/main.py", line 43, in main
mp.spawn(train, nprocs=args.gpus, args=(args,))
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/home/zhanhaolan/codes/vokenization/xmatching/main.py", line 233, in train
for i, (uid, lang_input, visn_input) in enumerate(tqdm.tqdm(train_loader, disable=(gpu!=0))):
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/tqdm/std.py", line 1167, in iter
for obj in iterable:
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84, in
return [default_collate(samples) for samples in transposed]
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84, in
return [default_collate(samples) for samples in transposed]
File "/home/zhanhaolan/anaconda3/envs/torch1.4py37/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [14] at entry 0 and [12] at entry 1

Hi, Do you have any idea about this issue?

I think that it is a problem caused by a higher version of HuggingFace's transformer version (especially for the tokenizers). Could you help try to downgrade it to transformers == 3.3 (this is the version when I released the code and test the scripts on)?

@airsplay Great! I have resloved this problem when I change the Huggingface's transformers to version 3.3.0. I suggest that you could revise the requirements file, where the treansformers version is still 2.7.0 ( transformers 2.7.0 -->3.3.0).
Best,

Thanks. I will change it.