microsoft/TAP

TAP Pretraining on TextCaps, TextVQA, ST-VQA on TextVQA down-stream dataset

XIRZC opened this issue · 0 comments

XIRZC commented

In the given README sample, I can see how to pretrain just on the TextVQA dataset, but when I want to pretrain with extra ST-VQA and TextCaps for the given TextVQA down-stream dataset, I have no idea.

By reading the newest mmf documents, I can see here are examples for training with extra ST-VQA config file, so now I successfully pretrain on the combination of TextVQA and ST-VQA. I just create a m4c_combo folder in the configs/vqa folder, and then new m4c_combo_pretrain.yml and m4c_combo_refine.yml. In the corresponding yml file, I include m4c_base_pretrain.yml and m4c_base_refine.yml, and then in the image_features and imdb_files field insert extra ST-VQA entry.

But when I want to add extra TextCaps dataset config, I get some no 'question_id' error in a dictionary during loading the dataset stage. I guess the reason may be TextCaps dataset has just 'caption' field in the imdb_files, but have not extra 'question_id' field in the imdb_files.

So can you provide extra multiple dataset pretraining examples or configuration files for us? I really appreciate for your help. And I believe this will be helpful for future textvqa research community, too.