Awesome-Self-Supervised-Papers

Collecting papers about Self-Supervised Learning, Representation Learning.

Last Update : 2021. 01. 31.

Update import papers (SwAV, Colorization, JigSaw, ...)
Update timetable
Any contributions, comments are welcome.

Computer Vision (CV)

Pretraining / Feature / Representation

Contrastive Learning

Conference / Journal	Paper	ImageNet Acc (Top 1)
CVPR 2006	Dimensionality Reduction by Learning an Invariant Mapping	-
arXiv:1807.03748	Representation learning with contrastive predictive coding (CPC)	-
arXiv:1911.05722	Momentum Contrast for Unsupervised Visual Representation Learning (MoCo)	60.6 %
arXiv:1905.09272	Data-Efficient Image Recognition contrastive predictive coding (CPC v2)	63.8 %
arXiv:1906.05849	Contrastive Multiview Coding (CMC)	66.2 %
arXiv:2002.05709	A Simple Framework for Contrastive Learning of Visual Representations (SimCLR)	69.3 %
arXiv:2003.12338	Improved Baselines with Momentum Contrastive Learning(MoCo v2)	71.1 %
arXiv:2003.05438	Rethinking Image Mixture for Unsupervised Visual Representation Learning	65.9 %
arXiv:2004.05554	Feature Lenses: Plug-and-play Neural Modules for Transformation-Invariant Visual Representations
arXiv:2006.10029	Big Self-Supervised Models are Strong Semi-Supervised Learners(SimCLRv2)	77.5 % (10% label)
arXiv:2006.07733	Bootstrap Your Own Latent A New Approach to Self-Supervised Learning	74.3 %
arXiv:2006.09882	Unsupervised Learning of Visual Features by Contrasting Cluster Assignments(SwAV)	75.3%
arXiv:2008.05659	What Should Not Be Contrastive in Contrastive Learning	80.2 % (ImageNet-100)
arXiv:2007.00224	Debiased Contrastive Learning	74.6 % (ImageNet-100)
arXiv:2009.00104	A Framework For Contrastive Self-Supervised Learning And Designing A New Approach	-
ICLR2021 under review	SELF-SUPERVISED REPRESENTATION LEARNING VIA ADAPTIVE HARD-POSITIVE MINING	72.3% (ResNet-50(4x): 77.3%)
IEEE Access	Contrastive Representation Learning: A Framework and Review	review paper
arXiv:2010.01929	EQCO: EQUIVALENT RULES FOR SELF-SUPERVISED CONTRASTIVE LEARNING	68.5 % (Proposed) / 66.6 % (SimCLR) / 200epochs
arXiv:2010.01028	Hard Negative Mixing for Contrastive Learning	68.0% / 200epochs
arXiv:2011.10566	Exploring Simple Siamese Representation Learning(SimSiam)	68.1% / 100 epochs / 256 batch
arXiv:2010.06682	Are all negatives created equal in contrastive instance discrimination?	-
arXiv:2101.05224	Big Self-Supervised Models Advance Medical Image Classification	AUC: 0.7729 (SimCLR / ImagNet--> Chexpert / ResNet-152(2x))

Dense Contrastive Learning

rnal	Paper	Downstream
NeurIPS 2020	Unsupervised Learning of Dense Visual Representations	AP: 39.2 (COCO, BBOX)
arXiv:2011.10043	Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning	AP: 60.2 (VOC, BBOX)

Image Transformation

Conference / Journal	Paper	ImageNet Acc (Top 1).
ECCV 2016	Colorful image colorization(Colorization)	39.6%
ECCV 2016	Unsupervised learning of visual representations by solving jigsaw puzzles	45.7%
CVPR 2018	Unsupervised Feature Learning via Non-Parametric Instance Discrimination (NPID, NPID++)	NPID: 54.0%, NPID++: 59.0%
CVPR 2018	Boosting Self-Supervised Learning via Knowledge Transfer (Jigsaw++)	-
CVPR 2020	Self-Supervised Learning of Pretext-Invariant Representations (PIRL)	63.6 %
CVPR 2020	Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics	-
arXiv:2003.04298	Multi-modal Self-Supervision from Generalized Data Transformations	-

Others (in Pretraining / Feature / Representation)

Conference / Journal	Paper	Method
ICML 2018	Mutual Information Neural Estimation	Mutual Information
NeurIPS 2019	Wasserstein Dependency Measure for Representation Learning	Mutual Information
ICLR 2019	Learning Deep Representations by Mutual Information Estimation and Maximization	Mutual Information
arXiv:1903.12355	Local Aggregation for Unsupervised Learning of Visual Embeddings	Local Aggregation
arXiv:1906.00910	Learning Representations by Maximizing Mutual Information Across Views	Mutual Information
arXiv:1907.02544	Large Scale Adversarial Representation Learning(BigBiGAN)	Adversarial Training
ICLR 2020	On Mutual Information Maximization for Representation Learning	Mutual Information
CVPR 2020	How Useful is Self-Supervised Pretraining for Visual Tasks?	-
CVPR 2020	Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning	Adversarial Training
ICLR 2020	Self-Labeling via Simultaneous Clustering and Representation Learning	Information
arXiv:1912.11370	Big Transfer (BiT): General Visual Representation Learning	pre-training
arXiv:2009.07724	Evaluating Self-Supervised Pretraining Without Using Labels	pre-training
arXiv:2010.00578	UNDERSTANDING SELF-SUPERVISED LEARNING WITH DUAL DEEP NETWORKS	Dual Deep Network
ICLR 2021 under review	REPRESENTATION LEARNING VIA INVARIANT CAUSAL MECHANISMS	Casual mechanism
arXiv:2006.06882	Rethinking Pre-training and Self-training	Rethinking

Identification / Verification / Classification / Recognition

Conference / Journal	Paper	Datasets	Performance
CVPR 2020	Real-world Person Re-Identification via Degradation Invariance Learning	MLR-CHUK03	Acc : 85.7(R@1)
CVPR 2020	Spatially Attentive Output Layer for Image Classification	ImageNet	Acc : 81.01 (Top-1)
CVPR 2020	Look-into-Object: Self-supervised Structure Modeling for Object Recognition	ImageNet	Top-1 err : 22.87

Segmentation / Depth Estimation

Conference / Journal	Paper	Datasets	Performance
CVPR 2020	Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation	VOC 2012	mIoU : 64.5
CVPR 2020	Towards Better Generalization: Joint Depth-Pose Learning without PoseNet	KITTI 2015	F1 : 18.05 %
IROS 2020	Monocular Depth Estimation with Self-supervised Instance Adaptation	KITTI 2015	Abs Rel : 0.074
CVPR 2020	Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera	-	-
CVPR 2020	Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision	GTA5->Cityscape	mIoU : 46.3
CVPR 2020	D3VO : Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry	-	-
CVPR 2020	Self-Supervised Human Depth Estimation from Monocular Videos	-	-
arxiv:2009.07714	Calibrating Self-supervised Monocular Depth Estimation	KITTI	Abs Rel: 0.113

Detection / Localization

Conference / Journal	Paper	Datsets	Performance
CVPR 2020	Instance-aweare, Context-focused, and Memory-efficient Weakly Supervised Object Detection	VOC 2012	AP(50) : 67.0

Geneartion

Conference / Journal	Paper	Task
CVPR 2020	StyleRig: Rigging StyleGAN for 3D Control over Portrait Images	Portrait Images
ICLR 2020	From Inference to Generation: End-to-End Fully Self-Supervised Generation of Human Face from Speech	Generate human face from speech
ACMMM2020	Neutral Face Game Character Auto-Creation via PokerFace-GAN
ICLR 2021 under review	Self-Supervised Variational Auto-Encoders	FID: 34.71 (CIFAR-10)

Video

Conference / Journal	Paper	Task	Datasets	Performance
TPAMI	A Review on Deep Learning Techniques for Video Prediction	Video prediction review	-	-
CVPR 2020	Distilled Semantics for Comprehensive Scene Understanding from Videos	Scene Understanding	KITTI 2015	Sq Rel : 0.748
CVPR 2020	Self-Supervised Learning of Video-Induced Visual Invariances	Representation Learning	-	-
ECCV 2020	Video Representation Learning by Recognizing Temporal Transformations	Representation Learning	UCF101	26.1 % (Video Retrieval Top-1)
arXiv:2008.02531	Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework	Representation Learning	UCF101	42.4 % (Video Retrieval Top-1)

Others

Conference / Journal	Paper	Task	Performance
CVPR 2020	Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching	Optical Flow	F1 : 7.63% (KITTI 2012)
CVPR 2020	Self-Supervised Viewpoint Learning From Image Collections	Viewpoint learning	MAE : 4.0 (BIWI)
CVPR 2020	Self-Supervised Scene De-occlusion	Remove occlusion	mAP : 29.3 % (KINS)
CVPR 2020	Distilled Semantics for Comprehensive Scene Understanding from Videos	Scene Understanding	-
CVPR 2020	Learning by Analogy : Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation	Optical Flow	F1 : 11.79% (KITTI 2015)
CVPR 2020	D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features	3D Local Features	-
CVPR 2020	SpeedNet: Learning the Speediness in Videos	predict the "speediness"	-
CVPR 2020	Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation	Action Segmentation	F1@10 : 83.0 (GTEA)
CVPR 2020	MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale Robotic Navigation	Robotic Navigation	-
arXiv:2003.06734	Active Perception and Representation for Robotic Manipulation	Robot manipulation	-
arXiv:2005.01655	Words aren’t enough, their order matters: On the Robustness of Grounding Visual Referring Expressions	Visual Referring Expressions	-
arXiv:2004.11362	Supervised Contrastive Learning	Supervised Contrastive Learning	ImageNet Acc: 80.8 (Top-1)
arXiv:2007.14449	Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation	Domain Adaptation	GTA5 to Cityscape : 47.5 (mIoU)
arXiv:2007.12360	On the Effectiveness of Image Rotation for Open Set Domain Adaptation	Domain Adaptation	-
arXiv:2003.12283	LIMP: Learning Latent Shape Representations with Metric Preservation Priors	Geneartive models	-
arXiv:2004.04312	Learning to Scale Multilingual Representations for Vision-Language Tasks	Vision-Language	MSCOCO: 81.5
arXiv:2003.08934	NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis	View Synthesis	-
arXiv:2001.01536	Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification	Knowledge Distillation, Long-tail classification	-
arXiv:2006.07114	Knowledge Distillation Meets Self-Supervision	Knowledge Distillation	Res50 --> MobileNetv2 Acc: 72.57 (Top-1)
AAAI2020	Fast and Robust Face-to-Parameter Translation for Game Character Auto-Creation	Game Character Auto-Creation	-
arXiv:2009.07719	Domain-invariant Similarity Activation Map Metric Learning for Retrieval-based Long-term Visual Localization	Similarity Activation Map	-
arXiv:2008.10312	Self-Supervised Learning for Large-Scale Unsupervised Image Clustering	Image Clustering	ImageNet Acc: 38.60 (cluster assignment)
ICLR2021 under review	SSD: A UNIFIED FRAMEWORK FOR SELFSUPERVISED OUTLIER DETECTION	Outlier Detection	CIFAR10/CIFAR100 : 94.1% (in/out)

Natural Language Processing (NLP)

Conference / Journal	Paper	Datasets	Performance
arXiv:2004.03808	Improving BERT with Self-Supervised Attention	GLUE	Avg : 79.3 (BERT-SSA-H)
arXiv:2004.07159	PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation	MARCO	0.498 (Rouge-L)
ACL 2020	TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition	-	-
arXiv:1909.11942	ALBERT: A Lite BERT For Self-Supervised Learning of Language Representations	GLUE	Avg : 89.4
AAAI 2020	Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models	-	-
ACL 2020	Contrastive Self-Supervised Learning for Commonsense Reasoning	PDP-60	90.0%

Speech

Conference / Journal	Paper	Datasets	Performance
arXiv:1910.05453v3	VQ-WAV2VEC: SELF-SUPERVISED LEARNING OF DISCRETE SPEECH REPRESENTATIONS	nov92	WER : 2.34
arXiv:1911.03912v2	EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR SPEECH RECOGNITION	Librispeech	WER : 4.0
ICASSP 2020	Generative Pre-Training for Speech with Augoregressive Predictive Coding	-	-
Interspeech 2020	Jointly Fine-Tuning “BERT-like” Self Supervised Models to Improve Multimodal Speech Emotion Recognition	IEMOCAP	Emotion Acc: 75.458(%)

Graph

Conference / Journal	Paper	Datasets	Performance
arxiv:2009.05923	Contrastive Self-supervised Learning for Graph Classification	PROTEINS	A3-specific:85.80

Reinforcement Learning

Conference / Journal	Paper	Performance
arxiv:2009.05923	CONTRASTIVE BEHAVIORAL SIMILARITY EMBEDDINGS FOR GENERALIZATION IN REINFORCEMENT LEARNING	BiC-catch: 821±17 (Random Initialization / DrQ+PSEs)

hongmae1102/Awesome-Self-Supervised-Papers

Awesome-Self-Supervised-Papers

Computer Vision (CV)

Pretraining / Feature / Representation

Contrastive Learning

Image Transformation

Others (in Pretraining / Feature / Representation)

Identification / Verification / Classification / Recognition

Segmentation / Depth Estimation

Detection / Localization

Geneartion

Video

Others

Natural Language Processing (NLP)

Speech

Graph

Reinforcement Learning