Paper-Review

Reserach Interest

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation(NeurIPS 2021 Spotlight) [PAPER] [GITHUB] [REVIEW]
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision(ICLR 2022) [PAPER]
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework(ICML 2022) [PAPER] [GITHUB]
CoCa: Contrastive Captioners are Image-Text Foundation Models(2022) [PAPER] [REVIEW] [YOUTUBE]
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections(EMNLP 2022) [PAPER] [GITHUB]
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation(2022) [PAPER] [GITHUB] [REVEIW]
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models(2023) [PAPER] [GITHUB]
Flamingo: a Visual Language Model for Few-Shot Learning [PAPER] [REVIEW] [GITHUB]

SmallCAP: Lightweight Image Captioning Prompted with Retrieval Augmentation(CVPR 2023) [PAPER] [REVIEW] [GITHUB]
EVCAP: Retreival-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension(CVPR 2024) [PAPER] [REVIEW] [GITHUB]

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [PAPER] [REVIEW]
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning(EMNLP 2022) [PAPER] [GITHUB]
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention(ICLR 2024) [PAPER] [REVIEW] [GITHUB]
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model [PAPER] [REVIEW] [GITHUB]

Deep Industrial Image Anomaly Detection: A Survey(2023) [PAPER] [GITHUB]
Self-Supervised Anomaly Detection: A Survey and Outlook(2022) [PAPER] [REVIEW]
Vision-Language Models for Vision Tasks: A Survey(2023) [PAPER]
A survey of efficient fine-tuning methods for Vision-Language Models — Prompt and Adapter [PAPER]