고려대학교 산업경영공학과 DSBA 연구실(지도교수 : 강필성 교수님) 석사과정 손준영(junyeong_son@korea.ac.kr)
- 인공지능 및 딥러닝에 관해 석사 과정 기간 동안 관심 분야 논문을 읽고, 정리했습니다.
- 리뷰 내용에 관해 수정해야하거나, 궁금한 부분 있으시다면 이메일을 통해 연락 바랍니다.
- 깃허브 링크의 경우 오피셜 코드가 아닐 수 있습니다.
- [YOUTUBE] 링크에는 DSBA 연구실 유튜브에서 제가 리뷰한 영상을 포함했습니다.
Vision-Language Pretrained Model(VLM)
Lightweight Image Captioning Model
Parameter-Efficient Fine-Tuning(PEFT)
usingAdapter
- Align before Fuse: Vision and Language Representation Learning with Momentum Distillation(NeurIPS 2021 Spotlight) [PAPER] [GITHUB] [REVIEW]
- SimVLM: Simple Visual Language Model Pretraining with Weak Supervision(ICLR 2022) [PAPER]
- OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework(ICML 2022) [PAPER] [GITHUB]
- CoCa: Contrastive Captioners are Image-Text Foundation Models(2022) [PAPER] [REVIEW] [YOUTUBE]
- mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections(EMNLP 2022) [PAPER] [GITHUB]
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation(2022) [PAPER] [GITHUB] [REVEIW]
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models(2023) [PAPER] [GITHUB]
- Flamingo: a Visual Language Model for Few-Shot Learning [PAPER] [REVIEW] [GITHUB]
- SmallCAP: Lightweight Image Captioning Prompted with Retrieval Augmentation(CVPR 2023) [PAPER] [REVIEW] [GITHUB]
- EVCAP: Retreival-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension(CVPR 2024) [PAPER] [REVIEW] [GITHUB]
- VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [PAPER] [REVIEW]
- MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning(EMNLP 2022) [PAPER] [GITHUB]
- LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention(ICLR 2024) [PAPER] [REVIEW] [GITHUB]
- LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model [PAPER] [REVIEW] [GITHUB]
- Deep Industrial Image Anomaly Detection: A Survey(2023) [PAPER] [GITHUB]
- Self-Supervised Anomaly Detection: A Survey and Outlook(2022) [PAPER] [REVIEW]
- Vision-Language Models for Vision Tasks: A Survey(2023) [PAPER]
- A survey of efficient fine-tuning methods for Vision-Language Models — Prompt and Adapter [PAPER]