/FGA

Feature Guidance attack for VLP models. The approach involves the ALBEF, TCL, CLIP, and BEiT3 models, as well as the VE (Visual Entailment), VG (Visual Grounding), VR (Visual Reasoning), VQA (Visual Question Answering), ZC (Zero-shot Classification), and ITR (Image-Text Retrieval) tasks.

MIT LicenseMIT

Stargazers