Missing file and unable to load pretrained model

Question

Missing file and unable to load pretrained model

darthgera123 opened this issue 4 years ago · 7 comments

darthgera123 commented 4 years ago

While trying to get the code running, I faced 2 issues:

Unable to find datasets/refcoco+_unc/annotations/cache/refcoco+_val_20_36.pkl The link mentioned didn't have the cache repo and currently I'm trying to run it using an empty file
When I'm trying to load pytorch_model_9.bin, it is expecting the pretrained models present in the dictionary in volta/encoders.py for eg bert-base-encased, roberta, etc
Please help @elliottd @e-bug

Answer 1 · 2021-02-19T10:57:10.000Z

The cache file is generated the first time you run a model. Make sure you update datasets/ with your data directory.
Have you checked the examples yet (e.g. for ViLBERT)?

Answer 2 · 2021-02-19T11:12:39.000Z

Thanks for responding. I was running the Vilbert example only. This is the error that I am getting

ERROR - volta.utils -   Model name 'checkpoints/conceptual_captions/ctrl_vilbert/ctrl_vilbert_base/pytorch_model_9.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, roberta-base, roberta-large, roberta-large-mnli). We assumed 'checkpoints/conceptual_captions/ctrl_vilbert/ctrl_vilbert_base/pytorch_model_9.bin' was a path or url but couldn't find any file associated to this path or URL.

Which is why I raised issue 2
Please help @e-bug

Answer 3 · 2021-02-19T11:18:21.000Z

Are you sure you have the checkpoint exactly at checkpoints/conceptual_captions/ctrl_vilbert/ctrl_vilbert_base/pytorch_model_9.bin?
Try passing an absolute path and see if that solves your error.

Answer 4 · 2021-02-19T11:26:18.000Z

Thanks for responding. The path was wrong and now it loads. The latest error it shows is this :

THCudaCheck FAIL file=/pytorch/aten/src/ATen/native/cuda/Dropout.cu line=147 error=209 : no kernel image is available for execution on the device

Any idea @e-bug

Answer 5 · 2021-02-19T11:29:34.000Z

It might be the GPU device itself: I got that error when running the code on an older GPU.
I have no idea how to fix it. If you find a way, share it :)

Answer 6 · 2021-02-19T13:18:11.000Z

Is there a way to change batch sizes? Also whats the cuda version on your gpu? Im trying to get this running on 2080Ti with cuda 10.2 I am still getting that no kernel image error

Answer 7 · 2021-02-19T13:31:45.000Z

Of course, please check the configuration files for the tasks (e.g. this)
CUDA 10.1 or 10.2