cvlab-columbia/viper

"process_guesses" function in Listing 4. OK-VQA example

lifan-yuan opened this issue · 3 comments

Hi, thanks for sharing the code.

When reproducing your results on OK-VQA, I found that the in-context example you used for OK-VQA contains a function named process_guesses, which does not exist in the repo. Could you please provide its implementation?

Thanks a lot!

My assumption would be that the process_guesses function works similarly as with multiple-choice questions answering:

final_answer = LLM(question + 'which of the following choices is the most likely answer?' + [guess1, guess2, ..])

Though it would be nice if the authors could shed some light on this :)

Actually now I see there is another function that is similar to what I described and is also not included in the API listings (or anywhere else in this repository): VideoSegment.select_answers. Either these two functions implement the same functionality or my intuition was wrong.

Hi, apologies for the late reply. Probably not useful anymore, but just in case: yes, they are similar, but used for different benchmarks. select_answers is used in NextQA and it selects the correct answer among different options provided by the dataset. process_guesses takes as input different guesses made by the model, and chooses the best one given the available information.

We updated the code to add the benchmark code, including the implementation of these functions.