How to train VQA on my custom data?
xiaoqiang-lu opened this issue · 11 comments
Hello! I am trying to finetune OFA-large on VQA using custom dataset, using the finetuning instruction in the repo. I have checked my .tsv and .pkl file several times and they are correct as your provided sample. But after command "bash train_vqa_distributed.sh", the terminal just prints:
total_num_updates 40000
warmup_updates 1000
lr 5e-5
patch_image_size 480
The GPU usage will rise to a certain value and then suddenly return to zero, and then the program will end. I train on single server with 2 GPU. Looking forward to reply, thanks for your sharing work!
Hi, could you please provide the exact script you run on your machine and the information of your GPU-cards type? I will have a check on my environment.
Moreover, for fine-tuning on customed VQA-formated data, please also refer to this recent issue for more information #76.
Thanks for your reply! At first I was using two cards 3080ti, now I replaced them with 4 cards v100, however the same problem occurs. The script on my machine:
GPUS_PER_NODE=4
WORKER_CNT=1
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=8214
export RNAK=0
The rest are unchanged. I also make my own ans2label.pkl file.
Here is a part of my .tsv file without imgbase64.
Here is a part of my .pkl file.
Hi, have you checked the path of $log_file
defined in your training script? The running log is saved in this file rather than printed on stdout. The program may be ended for other reasons, which may be recorded in the log. Please share more information if you find this log file.
Thanks! It seems to be a problem with my image that is causing this, I am using the code you replied to in issue #56 for imgbase64.
Hi, please check whether the fields of the input data line which caused this error correspond with the specified selected_cols
. By default, the selected_cols
is specified as 0,5,2,3,4
in the script, which sequentially fetches the 0th (uniq_id), 5th (image), 2nd (question), 3rd (answer info), 4th (predict_objects) field from each input TSV line. If any of the field mismatches, errors may occur.
Hi, I think there is a misunderstanding of how each data line is organized. As mentioned in the readme, in each line in TSV file, the fields follow the exact order of question-id, image-id, question, answer (with confidence), predicted object labels and image base64 string, thus there are 6 fields in total in the TSV file (also the image-id field is not used). By specifying the selected_cols=0,5,2,3,4
, the program sequentially fetches the 0th (question-id), 5th (image), 2nd (question), 3rd (answer info), 4th (predict_objects) field from each input TSV line, resulting in a sample to be further processed in __getitem__
method of VqaGenDataset
.
By the way, for preparing the dataset TSV file, I would also recommend to prepare an original training sample with more than one golden answers into multiple samples each of which contains only one of the answers. This will take full advantage of the supervision of ground-truth answers of training samples. Otherwise, only the golden answer with the highest confidence score will be used as supervision.
Thanks! It seems to be a problem with my image that is causing this, I am using the code you replied to in issue #56 for imgbase64.
how you resolve this problem? I''m having same problem. Thanks