luong1409/vqa_thesis
This project is for my thesis with the architecture is the combination of mPLUG model and SimVLM with some additional modification is Text-Guided Attention and Image-Guided Attention.
Stargazers
No one’s star this repository yet.