/moco_align_uniform

MoCo with Alignment and Uniformity Loss.

Primary LanguagePythonOtherNOASSERTION

Momentum Contrast (MoCo) with Alignment and Uniformity Losses

This directory contains a PyTorch implementation of a MoCo variant using the Alignment and Uniformity losses proposed in paper: Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere:

@inproceedings{wang2020hypersphere,
  title={Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere},
  author={Wang, Tongzhou and Isola, Phillip},
  booktitle={International Conference on Machine Learning},
  organization={PMLR},
  pages={9929--9939},
  year={2020}
}

More code for this paper can be found at this repository.


NOTE

Under GitHub's new dark mode theme, the equations in this README may not be readable. If you have such problems, please instead see this README file with identical content and lightly colored equations. Unfortunately, as of February 2021, GitHub does not yet provide a way to detect user theme.


Requirements

Python >= 3.6
torch >= 1.5.0
torchvision

Datasets

ImageNet

The full ImageNet compatible with PyTorch can be obtained online, e.g., by following instructions specified in the official PyTorch ImageNet training code.

ImageNet-100 Subset

The ImageNet-100 subset contains a randomly sampled 100 classes of the full ImageNet (1000 classes). The list of the 100 classes we used in our experiments are provided in scripts/imagenet100_classes.txt. This subset is identical to the one used in Contrastive Multiview Coding (CMC).

We provide a script that constructs proper symlinks to form the subset from the full ImageNet. You may invoke it as following:

python scripts/create_imagenet_subset.py [PATH_TO_EXISTING_IMAGENET] [PATH_TO_CREATE_SUBSET]

Optionally, you may add argument --subset [PATH_TO_CLASS_SUBSET_FILE] to specify a custom subset file, which should follow the same format as scripts/imagenet100_classes.txt. See scripts/create_imagenet_subset.py for more options.

Getting Started

Unsupervised Training

This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported.

  • To train a ResNet-50 encoder with loss (default) on a 4-GPU machine, run:

    python main_moco.py \
        -a resnet50 \
        --lr 0.03 --batch-size 128 \
        --gpus 0 1 2 3 \
        --multiprocessing-distributed --world-size 1 --rank 0 \
        [PATH_TO_DATASET]
  • The following arguments control the loss form:

    Command-line Arguments Loss Term Default Values
    --moco-align-w AW --moco-align-alpha AALPHA AW=3 AALPHA=2
    --moco-unif-w UW --moco-unif-t UT UW=1 UT=3
    --moco-contr-w CW --moco-contr-tau CTAU CW=0 CTAU=0.07

    Note: By default, uses the "intra-batch" version, where the negative pair distances include both the distance between samples in each batch and features in queue, as well as pairwise distances within each batch (Equation 18). The command-line flag --moco-unif-no-intra-batch switches to the form without using pairwise distances within batch (Equation 17).

  • This repository also includes several techniques MoCo v2 added. To include those, set --aug-plus --mlp --cos, which turns on stronger augmentation, MLP header, and cosine learning rate scheduling.

  • For the ImageNet-100 subset, we recommend following the linear lr scaling recipe, with --lr 0.03 per --batch-size 128. For other datasets (e.g., the full ImageNet), you may need to use other learning rate and batch size settings.

Linear Classification

To evaluate an encoder by fitting a supervised linear classifier on frozen features, run:

python main_lincls.py \
    -a resnet50 \
    --lr 30.0 \
    --batch-size 256 \
    --pretrained [PATH_TO_CHECKPOINT] \
    --multiprocessing-distributed --world-size 1 --rank 0 \
    [PATH_TO_DATASET]

Reference Validation Accuracy

ImageNet-100

Batch Size Initial LR Loss Formula
Normal 128 0.03 73.12% (MoCo) 75.54% 75.44% 75.62% 74.52%
256 0.03 68.18% (MoCo) 69.3% 68.28% 69.66% 69.46%
256 0.06 71.08% (MoCo) 73.52% 73.34% 73.36% 73.18%
Strong Aug.
+
MLP Head
+
Cosine LR
128 0.03 73.92% 77.54% (MoCo v2) 77.4% 77.66% 76.7%
256 0.03 69.64% 67.52% (MoCo v2) 66.92% 67.44% 71.42%
256 0.06 73.36% 76.32% (MoCo v2) 75.5% 75.74% 73.84%

ImageNet

Batch Size Initial LR Loss Formula
Strong Aug.
+
MLP Head
+
Cosine LR
256 0.03 67.5%±0.1% (MoCo v2, from here) 67.694%

Note: Numbers with are computed without setting --moco-unif-no-intra-batch.

Trained ImageNet Checkpoints

We provide the ResNet50 encoder checkpoint trained on the full ImageNet with . The encoder is the one achieving 67.694% ImageNet validation top1 accuracy in the table above.

With PyTorch Hub, you may load them without even downloading this repository or the checkpoint:

encoder = torch.hub.load('SsnL/moco_align_uniform:align_uniform', 'imagenet_resnet50_encoder')

To load the encoder with the trained linear classifier, use:

encoder = torch.hub.load('SsnL/moco_align_uniform:align_uniform', 'imagenet_resnet50_encoder',
                         with_linear_clf=True)

See here for more details.

Additionally, you may download the saved checkpoints with more information from here.

Acknowledgements and Disclaimer

The code is modified from the official MoCo repository.

The ImageNet-100 results included in our paper were not computed using this code, since the official MoCo repository was not released at the time of our analysis. Instead, we used a modified version of the official Contrastive Multiview Coding (CMC) repository, which contains an unofficial implementation of MoCo. There are subtle differences in batch size, learning rate scheduling, queue initialization, etc. In our experience, code provided under this directory can achieve accuracies comparable to the numbers reported in our paper. We encourage readers looking for the exact detailed differences refer to the appendix of our paper and the CMC repository.

We thank authors of the MoCo repository and the CMC repository for kindly open-sourcing their codebases and promoting open research.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.