facebookresearch/grid-feats-vqa

Grid Feature Size? (2048, 26, 19) vs (2048,25,19)

CCYChongyanChen opened this issue · 1 comments

Hi, I am trying to run the grid+MCAN via MMF.
I extracted the grid features stored in .pth and each .pth has a size [2048, 26,19].
When I run the code, I mean a RuntimeError: The expanded size of the tensor (25) must match the existing size (26) at non-singleton dimension 1. Target sizes: [2048, 25, 19]. Tensor sizes: [2048, 26, 19]
Could you help me with that? Thank you!

The full traceback is attached.

Traceback (most recent call last):
File "/home/cc67459/MMF2/bin/mmf_run", line 33, in
sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')())
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 118, in run
nprocs=config.distributed.world_size,
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemo n, start_method='spawn')
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 158, in start_proce sses
while not context.join():
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 2 terminated with the following error:
Traceback (most recent call last):
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 20, in wrap
fn(i, *args)
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 66, in distributed_main
main(configuration, init_distributed=True, predict=p redict)
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 56, in main
trainer.train()
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/mmf
trainer.py", line 108, in train
self.training_loop()
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/core /training_loop.py", line 36, in training_loop
self.run_training_epoch()
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/core /training_loop.py", line 67, in run_training_epoch
for batch in self.train_loader:
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/dataloader.py", line 881, in _process_da ta
data.reraise()
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker p rocess 0.
Original Traceback (most recent call last):
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/_utils/worker.py", line 178, in worker loop
data = fetcher.fetch(index)
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/common/batch
collator.py", line 24, in call
sample_list = SampleList(batch)
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/common/sample .py", line 129, in init
self[field][idx] = self._get_data_copy(sample[field] )
RuntimeError: The expanded size of the tensor (25) must match the existing size (26) at non-singleton dimension 1. Target sizes: [2048, 25, 19]. Tensor sizes: [2048, 26, 19]

Please raise mmf related concerns in the mmf repo -- also I suspect the reason of this error is due to different spatial sizes of the features that are batched together, which could be fixed by taking the maximum possible batch size.