/mv-mr

Official implementation of the paper "MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation"

Primary LanguagePython

Python 3.9 PWC PWC

MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation

This repo contains official Pytorch implementations of the paper: MV–MR: Multi-Views and Multi-Representations for Self-Supervised Learning and Knowledge Distillation

Paper

Contents

  1. Introduction
  2. Installation
  3. Training
  4. Evaluation
  5. Results
  6. Citation

Introduction

We present a new method of self-supervised learning and knowledge distillation based on multi-views and multi-representations (MV–MR). MV–MR is based on the maximization of dependence between learnable embeddings from augmented and non-augmented views, jointly with the maximization of dependence between learnable embeddings from the augmented view and multiple non-learnable representations from the non-augmented view. We show that the proposed method can be used for efficient self-supervised classification and model-agnostic knowledge distillation. Unlike other self-supervised techniques, our approach does not use any contrastive learning, clustering, or stop gradients. MV–MR is a generic framework allowing the incorporation of constraints on the learnable embeddings via the usage of image multi-representations as regularizers. The proposed method is used for knowledge distillation. MV–MR provides state-of-the-art self-supervised performance on the STL10 and CIFAR20 datasets in a linear evaluation setup. We show that a low-complexity ResNet50 model pretrained using proposed knowledge distillation based on the CLIP ViT model achieves state-of-the-art performance on STL10 and CIFAR100 datasets.

Installation

Conda installation

conda env create -f environment.yml

Training

Training of the self-supervised model

To run the training of the self-supervised model, first fill the config file. Examples of the config files: configs/stl10_self_supervised.yaml, configs/imagenet_self_supervised.yaml.

Then run

python main_self_supervised.py --config <path to config file>

If you want to automatically select the batch size, add --auto_bs flag. If you want to automatically select learning rate, add --auto_lr flag.

Training semi-supervised model

To run the training of the semi-supervised model (fine-tuning the pretrained self-supervised model), first fill the config file. Examples of the config files: configs/stl10_semi_supervised.yaml, configs/imagenet_semi_supervised.yaml.

Then run

python main_semi_supervised \ 
--config <path to semi-supervised config> \ 
--path_ckpt <path to pretrained self-supervised model>

If you want to automatically select the batch size, add --auto_bs flag. If you want to automatically select learning rate, add --auto_lr flag.

Training of the distillation of CLIP into ResNet50

To run the self-supervised distillation of CLIP into ResNet50, first fill the config file. Example of the config file: configs/imagenet_clip_self_supervised.yaml.

python main_clip_self_supervised.py --config <path to the config>

If you want to automatically select the batch size, add --auto_bs flag. If you want to automatically select learning rate, add --auto_lr flag.

Training multiclass classification model on VOC07

Multiclass classification on VOC07 is one of the ways to evaluate the pretrained self-supervised models. The idea is to train liner model on top of the frozen embeddings from pretrained encoder.

To run the training of multiclass classification model on VOC07 dataset, first fill the config file. Example of config file: configs/imagenet_voc.yaml.

Then run

python main_voc.py  --config_voc <path to VOC config> \ 
--config_self <path to self-supervised config> \ 
--path_self <path to pretrained self-supervised model>

If you want to automatically select the batch size, add --auto_bs flag. If you want to automatically select learning rate, add --auto_lr flag.

Evaluation

Evaluate self-supervised model

Self-supervised model evaluation follows linear evaluation protocol: linear classifier is trained on top of frozen embeddings from the pretrained encoder. Script will process validation set and display Top-1 and Top-5 accuracies.

To run self-supervised model evaluation:

python evaluate_self_supervised.py --config <path to self-supervised config> \ 
--ckpt <path to model to evaluate> \ 
--epochs <number of epochs to filetune>

By default it will evaluate the linear classifier (called online finetuner), that is trained alongside the self-supervised encoder, If you want to retrain the linear classifier from scratch, add --retrain flag. Retraining might take some time, but it generally provides higher accuracy.

Evaluate semi-supervised model

Semi-supervised model evaluation simply loads model and processes validation set.

To run semi-supervised model evaluation:

python evaluate_semi_supervised.py --config <path to semi-supervised config> --ckpt <path to train semi-supervised model>

Evaluate distillation

To run the evaluation of the ResNet50 distilled from CLIP, simply run self-supervised model avaluation:

python evaluate_self_supervised.py --config <path to the config> \ 
--ckpt <path to model to evaluate> \ 
--epochs <number of epochs to filetune>

By default it will evaluate the linear classifier (called online finetuner), that is trained alongside the encoder. If you want to retrain the linear classifier from scratch, add --retrain flag. Retraining might take some time, but it generally provides higher accuracy.

Evaluate multiclass classification on VOC07

To run multiclass classification evaluation on VOC07:

python evaluate_voc.py --config <path to VOC config> --ckpt <path to model trained on VOC>

Convert pytorch-lightning weight to pytorch format

See scripts/ folder.

Results

Self-supervised models

Dataset Top-1 accuracy Top-5 accuracy Download link
STL10 89.67% 99.46% Download
ImageNet-1K 74.5% 92.1% Download
CIFAR20 73.2% 95.6% Download

Semi-supervised models

Dataset Top-1 accuracy Top-5 accuracy Percentage of labels Download link
ImageNet-1K 56.1% 79.4% 1% Download
ImageNet-1K 69.9% 89.5% 10% Download

Transfer learning on VOC

Pretrain dataset Finetune dataset mAP Download link
ImageNet-1k VOC2007 87.1 Download

Distillation

Dataset Top-1 accuracy Download link
ImageNet-1K 75.3% Download
STL10 95.6% Download
CIFAR100 78.6% Download

Citation

@article{kinakh2024mv,
  title={MV--MR: Multi-Views and Multi-Representations for Self-Supervised Learning and Knowledge Distillation},
  author={Kinakh, Vitaliy and Drozdova, Mariia and Voloshynovskiy, Slava},
  journal={Entropy},
  volume={26},
  number={6},
  pages={466},
  year={2024},
  publisher={MDPI}
}