Split different task data with different settings
Closed this issue · 1 comments
Dear authors,
Thanks for providing us this github repo and work.
I have one doubt regarding "Number of samples in different splits of our settings " given in the published paper. In the original CrisisMMD dataset, distribution of all the tasks is different from your research paper. I.e. i mean to say that number of samples are very high compare to the number which is mentioned in the paper. To get the distribution same as your paper, can you please help me to find out, which code file will generate the similar distribution or can you provide the data ?
Again thanks for your great work.
Hi Nandini @nandini211995 ,
Thanks for reaching out. To be clear, I'm not one of the authors of the paper. I just re-implemented their paper for my own research work, so it would be the best to email the original authors for more explanations.
According to my understanding, in the paper they discarded the samples which have different labels for image and text (you may refer to issue #4 and section 4.2 of the paper). You can set consistent-only
parameter to true to achieve that.