luong1409/vqa_thesis

This project is for my thesis with the architecture is the combination of mPLUG model and SimVLM with some additional modification is Text-Guided Attention and Image-Guided Attention.

Watchers

luong1409