kohjingyu/fromage

=> no checkpoint found at '=/home/...

Closed this issue · 0 comments

Hello,
First of all, I like to thank you for your research works.

I have my own dataset so I tried to fine tune using it over your pre-trained data result.
When I run main.py with the script below, I had the error message on the title, which means fromage didn't recognize pre-trained checkpoint data file so went to else condition. Although I referred your readme file in micro-detail, I was not able to find any clue to figure out this problem. Maby I miss-understood something. I'm working on A100 under the limited resource condition but A100 is also available too.
Here is my script below for main.py. Please advise me. Thanks.

python -u main.py
--dist-url "tcp://127.0.0.1:${randport}" --dist-backend 'nccl'
--multiprocessing-distributed --world-size 1 --rank 0
--dataset= xxx --val-dataset=xxx
--opt-version='facebook/opt-125m' --visual-model='openai/clip-vit-large-patch14'
--exp_name='xxxx' --image-dir='/datasets/xxxx' --log-base-dir='runs/'
--batch-size=4 --val-batch-size=100 --learning-rate=0.0003 --precision='fp32' --print-freq=100
--resume ='/home/.../pretrained_ckpt.pth.tar'
--epochs=4
--dataset_dir='xxxxxx'