what is the meaning of frame_num and answer_num?
aixiaodewugege opened this issue · 6 comments
Thanks for your brilliant work!
I can't find explanations about these two configuration : frame_num and answer_num . Could you please help me?
Thanks for your interest in our work! Here are explanations for those parameters
-
model.frame_num: num of selected keyframes
-
datasets.nextqa.vis_processor.train.n_frms: num of frames for selection
-
model. answer_num: num of multi-choice options (e.g. NeXT-QA has 5 options for each QA, STAR has 4 options for each QA)
We have instructions for running the Gradio demo locally and running the evaluation in this repo.
SeViLA requires at least 12 GB of memory to load the model and run an inference with batch size 1.
Sorry for my wrong expression. I have made it run locally with Gradio. I mean does it support model.predict_answers() function like BLIP2 to do inference? So that I can test on a dataset and see the output.
Besides, could you please give me some help about how to use your sevila without setting options? Should I change the sevila.generate_demo to sevila.generate or sevila.predict_answers ?
Yes, you can check and use generate() function to test on multi-choice QA datasets.
For open-ended answer generation, you can input with only questions and decode the FlanT5 output check here.