/Paper-Review

Read and review various papers in the field of Vision and Vision-Language.

Paper-Review

고려대학교 산업경영공학과 DSBA 연구실(지도교수 : 강필성 교수님) 석사과정 손준영(junyeong_son@korea.ac.kr)

  • 인공지능 및 딥러닝에 관해 석사 과정 기간 동안 관심 분야 논문을 읽고, 정리했습니다.
  • 리뷰 내용에 관해 수정해야하거나, 궁금한 부분 있으시다면 이메일을 통해 연락 바랍니다.
  • 깃허브 링크의 경우 오피셜 코드가 아닐 수 있습니다.
  • [YOUTUBE] 링크에는 DSBA 연구실 유튜브에서 제가 리뷰한 영상을 포함했습니다.

Reserach Interest

  • Vision-Language Pretrained Model(VLM)
  • Lightweight Image Captioning Model
  • Parameter-Efficient Fine-Tuning(PEFT) using Adapter

① Vision-Language Pretrained Model

  • Align before Fuse: Vision and Language Representation Learning with Momentum Distillation(NeurIPS 2021 Spotlight) [PAPER] [GITHUB] [REVIEW]
  • SimVLM: Simple Visual Language Model Pretraining with Weak Supervision(ICLR 2022) [PAPER]
  • OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework(ICML 2022) [PAPER] [GITHUB]
  • CoCa: Contrastive Captioners are Image-Text Foundation Models(2022) [PAPER] [REVIEW] [YOUTUBE]
  • mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections(EMNLP 2022) [PAPER] [GITHUB]
  • BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation(2022) [PAPER] [GITHUB] [REVEIW]
  • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models(2023) [PAPER] [GITHUB]
  • Flamingo: a Visual Language Model for Few-Shot Learning [PAPER] [REVIEW] [GITHUB]

② Lightweight Image Captioning Model

  • SmallCAP: Lightweight Image Captioning Prompted with Retrieval Augmentation(CVPR 2023) [PAPER] [REVIEW] [GITHUB]
  • EVCAP: Retreival-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension(CVPR 2024) [PAPER] [REVIEW] [GITHUB]

③ Parameter-Efficient Fine-Tuning(Adapter)

  • VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [PAPER] [REVIEW]
  • MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning(EMNLP 2022) [PAPER] [GITHUB]
  • LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention(ICLR 2024) [PAPER] [REVIEW] [GITHUB]
  • LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model [PAPER] [REVIEW] [GITHUB]

④ Survey

  • Deep Industrial Image Anomaly Detection: A Survey(2023) [PAPER] [GITHUB]
  • Self-Supervised Anomaly Detection: A Survey and Outlook(2022) [PAPER] [REVIEW]
  • Vision-Language Models for Vision Tasks: A Survey(2023) [PAPER]
  • A survey of efficient fine-tuning methods for Vision-Language Models — Prompt and Adapter [PAPER]