jwkanggist/SSL-narratives-VLM-1

거꾸로 읽는 SSL 시즌3 - VLM

[스터디 ] 거꾸로 읽는 self-supervised-learning 시즌3: Visual Language Models

신청 Google form (2/18 모집마감)
거꾸로 읽는 SSL 이번에는 VLM 분야로 넘어 왔습니다! :)
2021년도 이후로 가파르게 발전하고 있는 Visual Language Models 논문에 집중하여 의미가 있었던 논문을 살펴봅니다.
해당 논문에서 제시하는 메소드의 특징 그리고 역사적으로 평가되는 이유에 대해서 즐겁게 토론하는 시간을 가집니다.
Paper list Google sheet

기간 (예정)

2023 3/4 ~ 6/3 (14주간)

발표 논문 및 순서

	Type	Paper title	Affiliation	Date to be published at ArXiv	Speaker	Youtube
1 주차	VLM bechmark and metric	VLM 관련 벤치마크와 메트릭 소개			강재욱	youtube1, youtube2
2 주차	Vision transformer	ViT:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	Google	2020 Oct	이인규	youtube
3 주차	Dual encoder	CLIP: Learning Transferable Visual Models From Natural Language Supervision	OpenAI	2021 Feb	김희은	youtube
4 주차	Image-text matching	Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers	MS	2020 Apr	신성호	youtube
5 주차	Image-text contrastive learning	ALBEF: Align before Fuse: Vision and LanguageRepresentation Learning with Momentum Distillation	Salesforce	2021 Jul	이유경	youtube
6 주차	Masked Image Modeling	BEiT: BERT Pre-Training of Image Transformers	MS	2021 Jun	박민지
7 주차	Masked VLM	Masked Vision and Language Modeling for Multi-modal Representation Learning	Amazon	2022 Aug	김강민	youtube
8 주차	Multimodal funsion by MoE	VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts	MS	2021 Nov	백혜림	youtube
9 주차	Multimodal funsion by merged attention	SimVLM: Simple Visual Language Model Pretraining with Weak Supervision	Google	2021 Aug	정윤성	youtube
10 주차	Multimodal funsion by co-attention	CoCa: Contrastive Captioners are Image-Text Foundation Models	Google	2022 May	김승우	youtube
11 주차	Few-shot learning in VLM	Flamingo: a visual language model for few-shot learning	DeepMind	2022 Apr	조성국	youtube
12 주차	Model scaling for VLM 1	GIT: A Generative Image-to-text Transformer for Vision and Language	MS	2022 May	김기범	youtube
13 주차	Model scaling for VLM 2	PaLI: A Jointly-Scaled Multilingual Language-Image Model	Google	2022 Sep	이영수
14 주차	wrap-up	전체 흐름 재정리			강재욱

관련 링크