作者你好，这个FLMR和官方版本的Retrieval-Augmented-Visual-Question-Answering区别在哪？FLMR是不是没用blip？

Question

作者你好，这个FLMR和官方版本的Retrieval-Augmented-Visual-Question-Answering区别在哪？FLMR是不是没用blip？

Closed this issue 6 months ago · 2 comments

zzk2021 commented 6 months ago

Answer 1 · 2024-06-16T10:32:29.000Z

This is also official. It is difficult to set up the entire environment with the original RA-VQA codebase. We provide an implementation based on Huggingface-transformers. This implementation provides support for inference with FLMR and PreFLMR, fine-tuning scripts, and evaluation scripts. You can easily run inference with this implementation.
If you want to fine-tune a blip based on the retrieved documents of PreFLMR, you can run inference, collect the retrieved documents, and write your own code to fine-tune a blip model.

Answer 2 · 2024-06-20T01:58:54.000Z

thanks for your reply!