Awesome-Self-Supervised-Papers

Collecting papers about Self-Supervised Learning, Representation Learning.

Last Update : 2021. 01. 31.

  • Update import papers (SwAV, Colorization, JigSaw, ...)
  • Update timetable
  • Any contributions, comments are welcome.

Computer Vision (CV)

Pretraining / Feature / Representation

Contrastive Learning

Conference / Journal Paper ImageNet Acc (Top 1)
CVPR 2006 Dimensionality Reduction by Learning an Invariant Mapping -
arXiv:1807.03748 Representation learning with contrastive predictive coding (CPC) -
arXiv:1911.05722 Momentum Contrast for Unsupervised Visual Representation Learning (MoCo) 60.6 %
arXiv:1905.09272 Data-Efficient Image Recognition contrastive predictive coding (CPC v2) 63.8 %
arXiv:1906.05849 Contrastive Multiview Coding (CMC) 66.2 %
arXiv:2002.05709 A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) 69.3 %
arXiv:2003.12338 Improved Baselines with Momentum Contrastive Learning(MoCo v2) 71.1 %
arXiv:2003.05438 Rethinking Image Mixture for Unsupervised Visual Representation Learning 65.9 %
arXiv:2004.05554 Feature Lenses: Plug-and-play Neural Modules for Transformation-Invariant Visual Representations
arXiv:2006.10029 Big Self-Supervised Models are Strong Semi-Supervised Learners(SimCLRv2) 77.5 % (10% label)
arXiv:2006.07733 Bootstrap Your Own Latent A New Approach to Self-Supervised Learning 74.3 %
arXiv:2006.09882 Unsupervised Learning of Visual Features by Contrasting Cluster Assignments(SwAV) 75.3%
arXiv:2008.05659 What Should Not Be Contrastive in Contrastive Learning 80.2 % (ImageNet-100)
arXiv:2007.00224 Debiased Contrastive Learning 74.6 % (ImageNet-100)
arXiv:2009.00104 A Framework For Contrastive Self-Supervised Learning And Designing A New Approach -
ICLR2021 under review SELF-SUPERVISED REPRESENTATION LEARNING VIA ADAPTIVE HARD-POSITIVE MINING 72.3% (ResNet-50(4x): 77.3%)
IEEE Access Contrastive Representation Learning: A Framework and Review review paper
arXiv:2010.01929 EQCO: EQUIVALENT RULES FOR SELF-SUPERVISED CONTRASTIVE LEARNING 68.5 % (Proposed) / 66.6 % (SimCLR) / 200epochs
arXiv:2010.01028 Hard Negative Mixing for Contrastive Learning 68.0% / 200epochs
arXiv:2011.10566 Exploring Simple Siamese Representation Learning(SimSiam) 68.1% / 100 epochs / 256 batch
arXiv:2010.06682 Are all negatives created equal in contrastive instance discrimination? -
arXiv:2101.05224 Big Self-Supervised Models Advance Medical Image Classification AUC: 0.7729 (SimCLR / ImagNet--> Chexpert / ResNet-152(2x))

Dense Contrastive Learning

rnal Paper Downstream
NeurIPS 2020 Unsupervised Learning of Dense Visual Representations AP: 39.2 (COCO, BBOX)
arXiv:2011.10043 Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning AP: 60.2 (VOC, BBOX)

Image Transformation

Conference / Journal Paper ImageNet Acc (Top 1).
ECCV 2016 Colorful image colorization(Colorization) 39.6%
ECCV 2016 Unsupervised learning of visual representations by solving jigsaw puzzles 45.7%
CVPR 2018 Unsupervised Feature Learning via Non-Parametric Instance Discrimination (NPID, NPID++) NPID: 54.0%, NPID++: 59.0%
CVPR 2018 Boosting Self-Supervised Learning via Knowledge Transfer (Jigsaw++) -
CVPR 2020 Self-Supervised Learning of Pretext-Invariant Representations (PIRL) 63.6 %
CVPR 2020 Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics -
arXiv:2003.04298 Multi-modal Self-Supervision from Generalized Data Transformations -

Others (in Pretraining / Feature / Representation)

Conference / Journal Paper Method
ICML 2018 Mutual Information Neural Estimation Mutual Information
NeurIPS 2019 Wasserstein Dependency Measure for Representation Learning Mutual Information
ICLR 2019 Learning Deep Representations by Mutual Information Estimation and Maximization Mutual Information
arXiv:1903.12355 Local Aggregation for Unsupervised Learning of Visual Embeddings Local Aggregation
arXiv:1906.00910 Learning Representations by Maximizing Mutual Information Across Views Mutual Information
arXiv:1907.02544 Large Scale Adversarial Representation Learning(BigBiGAN) Adversarial Training
ICLR 2020 On Mutual Information Maximization for Representation Learning Mutual Information
CVPR 2020 How Useful is Self-Supervised Pretraining for Visual Tasks? -
CVPR 2020 Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning Adversarial Training
ICLR 2020 Self-Labeling via Simultaneous Clustering and Representation Learning Information
arXiv:1912.11370 Big Transfer (BiT): General Visual Representation Learning pre-training
arXiv:2009.07724 Evaluating Self-Supervised Pretraining Without Using Labels pre-training
arXiv:2010.00578 UNDERSTANDING SELF-SUPERVISED LEARNING WITH DUAL DEEP NETWORKS Dual Deep Network
ICLR 2021 under review REPRESENTATION LEARNING VIA INVARIANT CAUSAL MECHANISMS Casual mechanism
arXiv:2006.06882 Rethinking Pre-training and Self-training Rethinking

Identification / Verification / Classification / Recognition

Conference / Journal Paper Datasets Performance
CVPR 2020 Real-world Person Re-Identification via Degradation Invariance Learning MLR-CHUK03 Acc : 85.7(R@1)
CVPR 2020 Spatially Attentive Output Layer for Image Classification ImageNet Acc : 81.01 (Top-1)
CVPR 2020 Look-into-Object: Self-supervised Structure Modeling for Object Recognition ImageNet Top-1 err : 22.87

Segmentation / Depth Estimation

Conference / Journal Paper Datasets Performance
CVPR 2020 Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation VOC 2012 mIoU : 64.5
CVPR 2020 Towards Better Generalization: Joint Depth-Pose Learning without PoseNet KITTI 2015 F1 : 18.05 %
IROS 2020 Monocular Depth Estimation with Self-supervised Instance Adaptation KITTI 2015 Abs Rel : 0.074
CVPR 2020 Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera - -
CVPR 2020 Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision GTA5->Cityscape mIoU : 46.3
CVPR 2020 D3VO : Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry - -
CVPR 2020 Self-Supervised Human Depth Estimation from Monocular Videos - -
arxiv:2009.07714 Calibrating Self-supervised Monocular Depth Estimation KITTI Abs Rel: 0.113

Detection / Localization

Conference / Journal Paper Datsets Performance
CVPR 2020 Instance-aweare, Context-focused, and Memory-efficient Weakly Supervised Object Detection VOC 2012 AP(50) : 67.0

Geneartion

Conference / Journal Paper Task
CVPR 2020 StyleRig: Rigging StyleGAN for 3D Control over Portrait Images Portrait Images
ICLR 2020 From Inference to Generation: End-to-End Fully Self-Supervised Generation of Human Face from Speech Generate human face from speech
ACMMM2020 Neutral Face Game Character Auto-Creation via PokerFace-GAN
ICLR 2021
under review
Self-Supervised Variational Auto-Encoders FID: 34.71 (CIFAR-10)

Video

Conference / Journal Paper Task Datasets Performance
TPAMI A Review on Deep Learning Techniques for Video Prediction Video prediction review - -
CVPR 2020 Distilled Semantics for Comprehensive Scene Understanding from Videos Scene Understanding KITTI 2015 Sq Rel : 0.748
CVPR 2020 Self-Supervised Learning of Video-Induced Visual Invariances Representation Learning - -
ECCV 2020 Video Representation Learning by Recognizing Temporal Transformations Representation Learning UCF101 26.1 % (Video Retrieval Top-1)
arXiv:2008.02531 Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework Representation Learning UCF101 42.4 % (Video Retrieval Top-1)

Others

Conference / Journal Paper Task Performance
CVPR 2020 Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching Optical Flow F1 : 7.63% (KITTI 2012)
CVPR 2020 Self-Supervised Viewpoint Learning From Image Collections Viewpoint learning MAE : 4.0 (BIWI)
CVPR 2020 Self-Supervised Scene De-occlusion Remove occlusion mAP : 29.3 % (KINS)
CVPR 2020 Distilled Semantics for Comprehensive Scene Understanding from Videos Scene Understanding -
CVPR 2020 Learning by Analogy : Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation Optical Flow F1 : 11.79% (KITTI 2015)
CVPR 2020 D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features 3D Local Features -
CVPR 2020 SpeedNet: Learning the Speediness in Videos predict the "speediness" -
CVPR 2020 Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation Action Segmentation F1@10 : 83.0 (GTEA)
CVPR 2020 MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale Robotic Navigation Robotic Navigation -
arXiv:2003.06734 Active Perception and Representation for Robotic Manipulation Robot manipulation -
arXiv:2005.01655 Words aren’t enough, their order matters: On the Robustness of Grounding Visual Referring Expressions Visual Referring Expressions -
arXiv:2004.11362 Supervised Contrastive Learning Supervised Contrastive Learning ImageNet Acc: 80.8 (Top-1)
arXiv:2007.14449 Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation Domain Adaptation GTA5 to Cityscape : 47.5 (mIoU)
arXiv:2007.12360 On the Effectiveness of Image Rotation for Open Set Domain Adaptation Domain Adaptation -
arXiv:2003.12283 LIMP: Learning Latent Shape Representations with Metric Preservation Priors Geneartive models -
arXiv:2004.04312 Learning to Scale Multilingual Representations for Vision-Language Tasks Vision-Language MSCOCO: 81.5
arXiv:2003.08934 NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis View Synthesis -
arXiv:2001.01536 Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification Knowledge Distillation, Long-tail classification -
arXiv:2006.07114 Knowledge Distillation Meets Self-Supervision Knowledge Distillation Res50 --> MobileNetv2 Acc: 72.57 (Top-1)
AAAI2020 Fast and Robust Face-to-Parameter Translation for Game Character Auto-Creation Game Character Auto-Creation -
arXiv:2009.07719 Domain-invariant Similarity Activation Map Metric Learning for Retrieval-based Long-term Visual Localization Similarity Activation Map -
arXiv:2008.10312 Self-Supervised Learning for Large-Scale Unsupervised Image Clustering Image Clustering ImageNet Acc: 38.60 (cluster assignment)
ICLR2021 under review SSD: A UNIFIED FRAMEWORK FOR SELFSUPERVISED OUTLIER DETECTION Outlier Detection CIFAR10/CIFAR100 : 94.1% (in/out)

Natural Language Processing (NLP)

Conference / Journal Paper Datasets Performance
arXiv:2004.03808 Improving BERT with Self-Supervised Attention GLUE Avg : 79.3 (BERT-SSA-H)
arXiv:2004.07159 PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation MARCO 0.498 (Rouge-L)
ACL 2020 TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition - -
arXiv:1909.11942 ALBERT: A Lite BERT For Self-Supervised Learning of Language Representations GLUE Avg : 89.4
AAAI 2020 Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models - -
ACL 2020 Contrastive Self-Supervised Learning for Commonsense Reasoning PDP-60 90.0%

Speech

Conference / Journal Paper Datasets Performance
arXiv:1910.05453v3 VQ-WAV2VEC: SELF-SUPERVISED LEARNING OF DISCRETE SPEECH REPRESENTATIONS nov92 WER : 2.34
arXiv:1911.03912v2 EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR SPEECH RECOGNITION Librispeech WER : 4.0
ICASSP 2020 Generative Pre-Training for Speech with Augoregressive Predictive Coding - -
Interspeech 2020 Jointly Fine-Tuning “BERT-like” Self Supervised Models to Improve Multimodal Speech Emotion Recognition IEMOCAP Emotion Acc: 75.458(%)

Graph

Conference / Journal Paper Datasets Performance
arxiv:2009.05923 Contrastive Self-supervised Learning for Graph Classification PROTEINS A3-specific:85.80

Reinforcement Learning

Conference / Journal Paper Performance
arxiv:2009.05923 CONTRASTIVE BEHAVIORAL SIMILARITY EMBEDDINGS FOR GENERALIZATION IN REINFORCEMENT LEARNING BiC-catch: 821±17 (Random Initialization / DrQ+PSEs)