Error when train the base model

Question

Error when train the base model

yangzhch6 opened this issue 5 years ago · 5 comments

When I train the base model (no finetuning on dense annotations), I got the following error, the model checkpoints are downloaded using your scripts.

Traceback (most recent call last):
File "train.py", line 292, in
pretrained_dict = torch.load(params['start_path'])
File "/home/visdial-bert/anaconda3/envs/visdial-bert/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/visdial-bert/anaconda3/envs/visdial-bert/lib/python3.6/site-packages/torch/serialization.py", line 613, in _load
result = unpickler.load()
File "/home/visdial-bert/anaconda3/envs/visdial-bert/lib/python3.6/site-packages/torch/_utils.py", line 135, in _rebuild_tensor_v2
tensor = _rebuild_tensor(storage, storage_offset, size, stride)
File "/home/visdial-bert/anaconda3/envs/visdial-bert/lib/python3.6/site-packages/torch/_utils.py", line 130, in _rebuild_tensor
t = torch.tensor([], dtype=storage.dtype, device=storage.device)
AttributeError: 'tuple' object has no attribute 'dtype'

Have you meet this bug? I would be appreciate if you can help me.

Answer 1 · 2019-12-20T07:25:47.000Z

Was wondering if you created the anaconda environment with the newer version of the env.yml file?

Answer 2 · 2019-12-20T07:28:19.000Z

Thank you for your response. I have already created the newer version anaconda environment.

Answer 3 · 2019-12-20T07:32:13.000Z

I unfortunately have never seen this issue before. Will get back to you on this.

Answer 4 · 2019-12-21T17:38:53.000Z

Thank you, after redownload your model, the code works now.
But I still have some confusion. The -batch_size in your parameters seems to be unused, and the batch size of your dataloader is fixed as 1. This should be the reason for the memory limit. If I understand you correctly, the parameter -batch_multiply can be treated as batch size now.

Answer 5 · 2020-01-01T18:47:27.000Z

Batch size parameter is being used. The batch_multiply param is not the batch size. The effective batch size is still batch size x batch_multiply.