Pinned Repositories
TMM
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
opencv-python
Automated CI toolchain to produce precompiled opencv-python, opencv-python-headless, opencv-contrib-python and opencv-contrib-python-headless packages.
X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
VQA
UAP_VLP
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
TMM
VQAttack
This is an official repository of ``VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models'' (AAAI 2024))
MLLM-Grounding-Robustness
[ICLR 2024 Workshop on Reliable and Responsible Foundation Models] Adversarial Robustness for Visual Grounding of Multimodal Large Language Models
FGA
Feature Guidance attack for VLP models. The approach involves the ALBEF, TCL, CLIP, and BEiT3 models, as well as the VE (Visual Entailment), VG (Visual Grounding), VR (Visual Reasoning), VQA (Visual Question Answering), ZC (Zero-shot Classification), and ITR (Image-Text Retrieval) tasks.