A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples. Without access to labels, dissimilar (negative) points are typically taken to be randomly sampled datapoints, implicitly accepting that these points may, in reality, actually have the same label. Perhaps unsurprisingly, we observe that sampling negative examples from truly different labels improves performance, in a synthetic setting where labels are available. Motivated by this observation, we develop a debiased contrastive objective that corrects for the sampling of same-label datapoints, even without knowledge of the true labels.
Debiased Contrastive Learning NeurIPS 2020 [paper]
Ching-Yao Chuang,
Joshua Robinson,
Lin Yen-Chen,
Antonio Torralba, and
Stefanie Jegelka
- Python 3.7
- PyTorch 1.3.1
- PIL
- OpenCV
We can train standard (biased) or debiased version (M=1) of SimCLR with main.py
on STL10 dataset.
flags:
--debiased
: use debiased objective (True) or standard objective (False)--tau_plus
: specify class probability--batch_size
: batch size for SimCLR
For instance, run the following command to train a debiased encoder.
python main.py --tau_plus = 0.1
*Due to the implementation of nn.DataParallel()
, training with at most 2 GPUs gives the best result.
The model is evaluated by training a linear classifier after fixing the learned embedding.
path flags:
--model_path
: specify the path to saved model
python linear.py --model_path results/model_400.pth
tau_plus | Arch | Latent Dim | Batch Size | Accuracy(%) | Download | |
---|---|---|---|---|---|---|
Biased | tau_plus = 0.0 | ResNet50 | 128 | 256 | 80.15 | model |
Debiased | tau_plus = 0.05 | ResNet50 | 128 | 256 | 81.85 | model |
Debiased | tau_plus = 0.1 | ResNet50 | 128 | 256 | 84.26 | model |
If you find this repo useful for your research, please consider citing the paper
@article{chuang2020debiased,
title={Debiased contrastive learning},
author={Chuang, Ching-Yao and Robinson, Joshua and Lin, Yen-Chen and Torralba, Antonio and Jegelka, Stefanie},
journal={Advances in Neural Information Processing Systems},
volume={33},
year={2020}
}
For any questions, please contact Ching-Yao Chuang (cychuang@mit.edu).
Part of this code is inspired by leftthomas/SimCLR.