vmurahari3/visdial-bert

Error when train the base model

yangzhch6 opened this issue · 5 comments

When I train the base model (no finetuning on dense annotations), I got the following error, the model checkpoints are downloaded using your scripts.

Traceback (most recent call last):
File "train.py", line 292, in
pretrained_dict = torch.load(params['start_path'])
File "/home/visdial-bert/anaconda3/envs/visdial-bert/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/visdial-bert/anaconda3/envs/visdial-bert/lib/python3.6/site-packages/torch/serialization.py", line 613, in _load
result = unpickler.load()
File "/home/visdial-bert/anaconda3/envs/visdial-bert/lib/python3.6/site-packages/torch/_utils.py", line 135, in _rebuild_tensor_v2
tensor = _rebuild_tensor(storage, storage_offset, size, stride)
File "/home/visdial-bert/anaconda3/envs/visdial-bert/lib/python3.6/site-packages/torch/_utils.py", line 130, in _rebuild_tensor
t = torch.tensor([], dtype=storage.dtype, device=storage.device)
AttributeError: 'tuple' object has no attribute 'dtype'

Have you meet this bug? I would be appreciate if you can help me.

Was wondering if you created the anaconda environment with the newer version of the env.yml file?

Thank you for your response. I have already created the newer version anaconda environment.

I unfortunately have never seen this issue before. Will get back to you on this.

Thank you, after redownload your model, the code works now.
But I still have some confusion. The -batch_size in your parameters seems to be unused, and the batch size of your dataloader is fixed as 1. This should be the reason for the memory limit. If I understand you correctly, the parameter -batch_multiply can be treated as batch size now.

Batch size parameter is being used. The batch_multiply param is not the batch size. The effective batch size is still batch size x batch_multiply.