Vocabulary and single image-question pair prediction

Question

foxm79 opened this issue 4 years ago · 2 comments

Is the vocabulary available that takes the words of the questions and converts them to 'input_ids'?
Is there a function that does this for an input question?
Is there a code that take a single image-question pair and predicts the answer?

Answer 1 · 2020-11-24T08:37:10.000Z

Answer 2 · 2020-11-24T12:58:44.000Z

Yes, that is what I followed eventually. Thanks for replying !