OFA-Sys/OFA

Additional issues trying to finetune on custom (VQA-like) dataset (VizWiz)

Velcorn opened this issue ยท 10 comments

Hello, first I'd like to thank you for your amazing work and especially all the detailed answers to issues.

I've been following the different issues on the finetuning on a custom dataset (VizWiz) and produced the .tsv files according to your format. You stated in issue #76 that the trainval_ans2label.pkl file is not used when using beam-search evaluation - is this correct?

I've skipped its creation and training does run for the first epoch. However, upon validation on the valid subset, I get an assertion error in the sequence_generator.py - I've tracked down the error and I can "fix" it by removing the one extra step that is for the EOS marker, but my understanding of how to properly fix that error is limited.

To give some more information of how the .tsv files look, I have attached an image for the train and val subset.

Thank you very much for any kind of input in advance!
image

Hi, I'm afraid there is a misunderstanding. The trainval_ans2label.pkl is not used during the inference in beam-search evaluation, but it is actually needed during training. Please prepare it according to your dataset and try again.

Thank you for the very quick response. I figured that might be the case, thanks for the clarification. Is there maybe a specific resource on how to prepare the file for a custom dataset?

Another minor question: In my created .tsv files, the index starts at 0 for each subset. Is that fine or do I have to add an offset for the other subsets?

The trainval_ans2label.pkl a pickled Python-dict which maps the candidate answer-text to label-id. It should be conformed with your dataset, otherwise the overflow problem mentioned in #59 will arise during training if an answer unseen in the trainval_ans2label.pkl is encountered in your dataset. You can open our provided trainval_ans2label.pkl with pickle for reference. The label-ids (0 to number of different answers-1) can be randomly assigned to each answer from the candidates.

The index of samples from each subset can be overlapped. The most important thing is to keep the index of the test samples conformed with the original dataset to make sure you get the correct evaluation score when submitting to the official evaluation server.

Thanks again for the extensive answer. So, the contents of the pickle file are the most frequent answers from both the train and validation subset (combined), right? Is there a guideline on how many frequent answers to include?

(Small correction for anyone reading this in the future: the overflow problem is mentioned in issue #59)

Thanks a lot for noticing the wrong reference of issue! โค๏ธ I've corrected my comment above. On VQA, our choice of candidate answer set is mainly following the common practice in previous works (VLMo, UNITER, etc.), which have employed a fixed set containing 3,129 frequent answers. I would recommend to refer to previous works for Vizwiz on how to determine the proper size of the frequent candidate set. Please check the answers in the pre-processed (maybe filtered) training samples all be covered in the pickle dict to avoid the overflow training issue.

Many thanks for all the help. It seems to be running pretty well now, given the small size of the dataset.

Many thanks for all the help. It seems to be running pretty well now, given the small size of the dataset.

Hi Velcorn,
Could you kindly share the code or the .tsv files for the VizWiz-VQA dataset? Our VizWiz group will release a new dataset (VizWiz-Therapy) recently and would like to benchmark this algorithm. Thank you so much in advance!

Many thanks for all the help. It seems to be running pretty well now, given the small size of the dataset.

Hi Velcorn, Could you kindly share the code or the .tsv files for the VizWiz-VQA dataset? Our VizWiz group will release a new dataset (VizWiz-Therapy) recently and would like to benchmark this algorithm. Thank you so much in advance!

Hey, sorry for the late answer, I've written this script to generate the .pkl file and .tsv files from the VizWiz-VQA dataset: https://github.com/Velcorn/OFA/blob/main/dataset/preprocess_vizwiz.py

Many thanks for all the help. It seems to be running pretty well now, given the small size of the dataset.

Hi Velcorn, Could you kindly share the code or the .tsv files for the VizWiz-VQA dataset? Our VizWiz group will release a new dataset (VizWiz-Therapy) recently and would like to benchmark this algorithm. Thank you so much in advance!

Hey, sorry for the late answer, I've written this script to generate the .pkl file and .tsv files from the VizWiz-VQA dataset: https://github.com/Velcorn/OFA/blob/main/dataset/preprocess_vizwiz.py

Thank you so much Velcorn for sharing! It helps a lot!

Many thanks for all the help. It seems to be running pretty well now, given the small size of the dataset.

Hi Velcorn, Could you kindly share the code or the .tsv files for the VizWiz-VQA dataset? Our VizWiz group will release a new dataset (VizWiz-Therapy) recently and would like to benchmark this algorithm. Thank you so much in advance!

Hey, sorry for the late answer, I've written this script to generate the .pkl file and .tsv files from the VizWiz-VQA dataset: https://github.com/Velcorn/OFA/blob/main/dataset/preprocess_vizwiz.py

Thank you so much Velcorn for sharing! It helps a lot!

You're welcome. Let me know if you have any questions!