OFA-Sys/OFA

OFA on customised task e.g. OK-VQA

chenxwh opened this issue · 6 comments

Hi, thanks for the awesome work!

I'd like to fine-tune OFA on OK-VQA, I have been trying to follow the instructions of VQA assuming they are similar, but I have raw image input (and question) , how do I convert to what OFA understands? Do I need to follow the format of the example tsv file? Is trainval_ans2label.pkl required (if so how do I generate it)?
What are the steps to take to extend OFA on OK-VQA?

Thank you in advance for your help!

For organizing the TSV file, please refer to the reply in this issue #56 (see the 2nd & 3rd paragraphs). For the creation of trainval_ans2label.pkl, it is just a pickled Python-dict which maps the candidate answer-text to label-id (for VQA-V2 it has 3,129 candidate answers, refer to this issue #59 for more information). You can try to replace the data and ans2label_file to your customed files and run the scripts. Other hyper-params like the sequence length and fine-tuning steps may be modified according to your specific task.

For organizing the TSV file, please refer to the reply in this issue #56 (see the 2nd & 3rd paragraphs). For the creation of trainval_ans2label.pkl, it is just a pickled Python-dict which maps the candidate answer-text to label-id (for VQA-V2 it has 3,129 candidate answers, refer to this issue #59 for more information). You can try to replace the data and ans2label_file to your customed files and run the scripts. Other hyper-params like the sequence length and fine-tuning steps may be modified according to your specific task.

Hi @yangapku,

Thank you for your reply! Are the answer candidates for trainval_ans2label.pkl gathered from all the possible answers in the Train and Val split from the VQAv2 (regardless of the confidence score)? And the label-ids (0 to number of different answers-1) are randomly assigned to each answer from the candidates?

Hi, in fact the 3,129 answer candidates do not covers all the appeared answers in the original VQAv2 dataset. It's just a common practice in the VQAv2 challenge to limit in these relatively frequent answers. We only keep the training and validation samples containing these answers. The label-ids are assigned randomly without specific rules.

Is there a rule to choose the 'frequent answer candidates' for a customised datasets if not all appeared answers are included? Also at testing/inference are the generated answers also restricted to only the answers in the trainval_ans2label.pkl?

Hi, I would recommend to refer to previous works for OK-VQA on how to determine the proper size of the frequent candidate set. In our work, we just exactly followed the practice of previous works on VQAv2. During inference, the generated answers are restricted to only the answers in the trainval_ans2label.pkl. In details, there are two types of inference for VQA as mentioned in readme. For all-candidate evaluation, the generated answers are explicitly enumerated from trainval_ans2label.pkl during inference. For beam-search evaluation, although there are not such a restriction during inference (the trainval_ans2label.pkl are not used), since the model is fine-tuned within the candidate answer set, the model will not likely to generate out-of-set answers either.

Thank you for the clarification!