_pickle.UnpicklingError: invalid load key, '-'.
liulijie-2020 opened this issue · 7 comments
Thanks for your great code.
I try to train on the vcr task to see result.
when i did
python vcr/val.py \ --a-cfg ./cfgs/vcr/base_q2a_4x16G_fp32.yaml --r-cfg ./cfgs/vcr/base_qa2r_4x16G_fp32.yaml \ --a-ckpt ./output/base_q2a_4x16G_fp32.yaml --r-ckpt ./output/base_qa2r_4x16G_fp32.yaml \ --gpus 0 1 \ --result-path ./results/ --result-name eval_vcr
,
the mistake happened.
As follows:
warnings.warn('miss keys: {}'.format(miss_keys)) Warnings: Unexpected keys: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.gamma', 'cls.predictions.transform.LayerNorm.beta', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']. Traceback (most recent call last): File "vcr/val.py", line 214, in <module> main() File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad return func(*args, **kwargs) File "vcr/val.py", line 114, in main a_ckpt = torch.load(args.a_ckpt, map_location=lambda storage, loc: storage) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/serialization.py", line 387, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/serialization.py", line 564, in _load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '-'.
I hope can get some help to solve the problem.
Thanks a lot.
Please follow the instructions in readme to fine-tune VL-BERT on VCR first, then you can do evaluation on it. Thank you!
Please follow the instructions in readme to fine-tune VL-BERT on VCR first, then you can do evaluation on it. Thank you!
thank you for your reply.And what you mean about fine-tune is the step about training part of readme?
Yes.
Yes.
thank you for your kindness help.i have done this part and got some files about Params in this part.
but then
PROGRESS: 0.00% Traceback (most recent call last): File "vcr/train_end2end.py", line 59, in <module> main() File "vcr/train_end2end.py", line 53, in main rank, model = train_net(args, config) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../vcr/function/train.py", line 337, in train_net gradient_accumulate_steps=config.TRAIN.GRAD_ACCUMULATE_STEPS) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/trainer.py", line 115, in train outputs, loss = net(*batch) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, **kwargs) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/module.py", line 22, in forward return self.train_forward(*inputs, **kwargs) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../vcr/modules/resnet_vlbert_for_vcr.py", line 261, in train_forward segms=segms) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/fast_rcnn.py", line 149, in forward roi_align_res = self.roi_align(img_feats['body4'], rois).type(images.dtype) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/roi_align.py", line 69, in forward input.float(), rois.float(), self.output_size, self.spatial_scale, self.sampling_ratio File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/roi_align.py", line 20, in forward input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio RuntimeError: Not compiled with GPU support (ROIAlign_forward at /home/songzijie/project/VLbert/VL-BERT-master/common/lib/roi_pooling/ROIAlign.h:21) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f5c43697dc5 in /home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int) + 0xf6 (0x7f5c2bda8396 in /home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so) frame #2: <unknown function> + 0x13f74 (0x7f5c2bdb3f74 in /home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so) frame #3: <unknown function> + 0x13ffe (0x7f5c2bdb3ffe in /home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so) frame #4: <unknown function> + 0x1138c (0x7f5c2bdb138c in /home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so) <omitting python frames> frame #11: THPFunction_apply(_object*, _object*) + 0x691 (0x7f5c696e7081 in /home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
by the way i check my cuda python -c 'import torch; from torch.utils.cpp_extension import CUDA_HOME; print(torch.cuda.is_available(), CUDA_HOME)' True /home/share/cuda/cuda-9.0
@liulijie-2020 Did you run the init.sh to compile the operators?
@liulijie-2020 Did you run the init.sh to compile the operators?
Yes,i did.
running build_ext copying build/lib.linux-x86_64-3.6/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so ->
Thanks for your help. I've solved the problem.Reason is the version of scipy ==1.5.1. When changed it to scipy ==1.4.1 and restart, the program went on way.
Thanks for the information!