Not all examples consist of positive images?
Closed this issue · 6 comments
shamanez commented
Hi, I went through the WebQA_train_val.json and found out of 41739 examples only 21465 has positive image ids? So is this normal or I did some mistake during the preprocessing?
WebQnA commented
Hi Shamane:
Thank you for your interest in WebQA!
Yes, this is normal. Note that our data has two folds: image-based queries
and text-based queries. Thus, text-based queries don’t have positive image
ids.
I’m happy to address any other concerns.
Wishing you all the best,
Yingshan
…On Tue, Oct 19, 2021 at 22:33 Shamane Siri ***@***.***> wrote:
Hi, I went through the WebQA_train_val.json and found out of 41739
examples only 21465 has positive image ids? So is this normal or I did some
mistake during the preprocessing?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AT2LVRZYBTDMTCKBTZGQ2IDUHYS6DANCNFSM5GKQ475Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
shamanez commented
So how could we train a multimodal system with this data? Because for example the image features are null and still do we need to input negative examples?
WebQnA commented
Hi Shamane:
Our challenge for the community is to build a reasoning model that doesn't
distinguish between the source modalities, instead treating all of them as
"knowledge embedded in a unified space". Thus, we require a model that
seamlessly handles both cases.
When image features are null, the model should still be able to reason from
the snippets. I think the key is to seamlessly "transit" back and forth
between images and text. In the most ideal case, the model is not
necessarily distinguishing between the two in its internal representation.
Hope that answers your question.
Thanks!
Yingshan
…On Wed, Oct 20, 2021 at 5:30 PM Shamane Siri ***@***.***> wrote:
So how could we train a multimodal system with this data? Because for
example the image features are null and still do we need to input negative
examples?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AT2LVRZKGC6YHNLPACFXB63UH4YHZANCNFSM5GKQ475Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
shamanez commented
Amazing. One more thing, I read the paper also but it was not that clear to me, how we feed the positive examples with negative examples as the input? Is it by concatenating everything?
WebQnA commented
Hi Shamane:
In our own implementation, we classify each source individually through a
binary classification, because BERT can't handle sequences longer than 512.
Concretely, during the retrieval stage, we pass [ <CLS>, context, <SEP>, Q
] into the transformer and treat the last hidden state of the CLS token as
the logit for the input context to be predicted as "positive".
Concatenating all sources opens up more opportunities for improvement
because it allows for cross-reasoning among multiple sources. This is left
for smarter future models that deal with longer input context windows.
Hope that answers your question.
Best,
Yingshan
…On Sun, Oct 24, 2021 at 3:01 AM Shamane Siri ***@***.***> wrote:
Amazing. One more thing, I read the paper also but it was not that clear
to me, how we feed the positive examples with negative examples as the
input? Is it by concatenating everything?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AT2LVRY477QFWH34HJXQGILUIOVMDANCNFSM5GKQ475Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
shamanez commented
Thanks a lot.