Cannot reproduce the result

Question

Cannot reproduce the result

Closed this issue 6 months ago · 2 comments

When I tried to reproduce the experimental result of the STAR dataset on my machine with a single RTX4090 GPU using the checkpoint provided on the google drive, I found that the accuracy is just around 25%. And when I tried to train the model with the following parameters, the loss always goes to nan easily. I'm not sure what the problem is.
python3 train.py --qav --vaq --max_seq_len 128 --batch_size 1 \ --epochs 10 --bias 3 --tau 100. --max_feats 10 --warmup_epochs 2 \ --dataset star --blr 9e-2 --weight_decay 0.16 --output_dir ./checkpoint/star \ --accum_iter 8 --dataset star --model llama-2-7b --llama_model_path ./pretrained/ \ --num_workers 8
The batch_size is set to 1 since the VRAM issue.

Answer 1 · 2023-12-24T04:57:33.000Z

Hello,
I am facing the issue as well.
Could you share with me how you solve the problem!?

Answer 2 · 2023-12-24T05:05:58.000Z

llama2 does not work for the framework. Just download llama1 model from this issue (the huggingface link) and the model can run properly.