Not all examples consist of positive images?

Question

Not all examples consist of positive images?

Closed this issue 3 years ago · 6 comments

Hi, I went through the WebQA_train_val.json and found out of 41739 examples only 21465 has positive image ids? So is this normal or I did some mistake during the preprocessing?

Answer 1 · 2021-10-20T15:30:40.000Z

Hi Shamane: Thank you for your interest in WebQA! Yes, this is normal. Note that our data has two folds: image-based queries and text-based queries. Thus, text-based queries don’t have positive image ids. I’m happy to address any other concerns. Wishing you all the best, Yingshan

…

On Tue, Oct 19, 2021 at 22:33 Shamane Siri ***@***.***> wrote: Hi, I went through the WebQA_train_val.json and found out of 41739 examples only 21465 has positive image ids? So is this normal or I did some mistake during the preprocessing? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AT2LVRZYBTDMTCKBTZGQ2IDUHYS6DANCNFSM5GKQ475Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Answer 2 · 2021-10-20T21:30:26.000Z

So how could we train a multimodal system with this data? Because for example the image features are null and still do we need to input negative examples?

Answer 3 · 2021-10-20T21:59:17.000Z

Hi Shamane: Our challenge for the community is to build a reasoning model that doesn't distinguish between the source modalities, instead treating all of them as "knowledge embedded in a unified space". Thus, we require a model that seamlessly handles both cases. When image features are null, the model should still be able to reason from the snippets. I think the key is to seamlessly "transit" back and forth between images and text. In the most ideal case, the model is not necessarily distinguishing between the two in its internal representation. Hope that answers your question. Thanks! Yingshan

…

On Wed, Oct 20, 2021 at 5:30 PM Shamane Siri ***@***.***> wrote: So how could we train a multimodal system with this data? Because for example the image features are null and still do we need to input negative examples? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AT2LVRZKGC6YHNLPACFXB63UH4YHZANCNFSM5GKQ475Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Answer 4 · 2021-10-24T07:01:11.000Z

Amazing. One more thing, I read the paper also but it was not that clear to me, how we feed the positive examples with negative examples as the input? Is it by concatenating everything?

Answer 5 · 2021-10-24T16:34:21.000Z

Hi Shamane: In our own implementation, we classify each source individually through a binary classification, because BERT can't handle sequences longer than 512. Concretely, during the retrieval stage, we pass [ <CLS>, context, <SEP>, Q ] into the transformer and treat the last hidden state of the CLS token as the logit for the input context to be predicted as "positive". Concatenating all sources opens up more opportunities for improvement because it allows for cross-reasoning among multiple sources. This is left for smarter future models that deal with longer input context windows. Hope that answers your question. Best, Yingshan

…

On Sun, Oct 24, 2021 at 3:01 AM Shamane Siri ***@***.***> wrote: Amazing. One more thing, I read the paper also but it was not that clear to me, how we feed the positive examples with negative examples as the input? Is it by concatenating everything? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AT2LVRY477QFWH34HJXQGILUIOVMDANCNFSM5GKQ475Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Answer 6 · 2021-10-27T01:43:49.000Z

Thanks a lot.