e-bug/volta

Missing file and unable to load pretrained model

darthgera123 opened this issue · 7 comments

While trying to get the code running, I faced 2 issues:

  1. Unable to find datasets/refcoco+_unc/annotations/cache/refcoco+_val_20_36.pkl The link mentioned didn't have the cache repo and currently I'm trying to run it using an empty file
  2. When I'm trying to load pytorch_model_9.bin, it is expecting the pretrained models present in the dictionary in volta/encoders.py for eg bert-base-encased, roberta, etc
    Please help @elliottd @e-bug
e-bug commented
  1. The cache file is generated the first time you run a model. Make sure you update datasets/ with your data directory.
  2. Have you checked the examples yet (e.g. for ViLBERT)?

Thanks for responding. I was running the Vilbert example only. This is the error that I am getting

ERROR - volta.utils -   Model name 'checkpoints/conceptual_captions/ctrl_vilbert/ctrl_vilbert_base/pytorch_model_9.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, roberta-base, roberta-large, roberta-large-mnli). We assumed 'checkpoints/conceptual_captions/ctrl_vilbert/ctrl_vilbert_base/pytorch_model_9.bin' was a path or url but couldn't find any file associated to this path or URL.

Which is why I raised issue 2
Please help @e-bug

e-bug commented

Are you sure you have the checkpoint exactly at checkpoints/conceptual_captions/ctrl_vilbert/ctrl_vilbert_base/pytorch_model_9.bin?
Try passing an absolute path and see if that solves your error.

Thanks for responding. The path was wrong and now it loads. The latest error it shows is this :

THCudaCheck FAIL file=/pytorch/aten/src/ATen/native/cuda/Dropout.cu line=147 error=209 : no kernel image is available for execution on the device

Any idea @e-bug

e-bug commented

It might be the GPU device itself: I got that error when running the code on an older GPU.
I have no idea how to fix it. If you find a way, share it :)

Is there a way to change batch sizes? Also whats the cuda version on your gpu? Im trying to get this running on 2080Ti with cuda 10.2 I am still getting that no kernel image error

e-bug commented
  1. Of course, please check the configuration files for the tasks (e.g. this)
  2. CUDA 10.1 or 10.2