In this repo, we provide a collection of scripts for pretraining embedding models for few-shot learning, some of them are used in Label Hallucination for Few-Shot Classification (in Dropbox). Thanks SKD and Rizve et al. for their original implementation.
I have updated their original code a bit so that you should run freely (latest pytorch and fix some bugs in SKD) on you machine following the training code. Newly re-trained models for tiered-ImageNet is here. I will keep updating it when more models are trained.
I have re-trained SKD and IER models. SKD models were trained with 4 GTX-Titan X and IER models were trained with 2 Nvidia-A100.
- SKD Generation 0
- SKD Generation 1 (
gamma=0.025
works best) - IER (
bsz=64
) - IER distill (
bsz=64
)
The data we used here is preprocessed by the repo of MetaOptNet, Please find the renamed versions of the files in below link by RFS. Download and unzip the dataset, put them under data
directory.
Note that training with tiered-ImageNet requires at least 64GB of CPU RAM, ideally 128GB.
First creating a conda environment named IER
conda create -n IER python=3.6 ### create envs
conda activate IER ### launch envs
pip install -r requirements.txt ### install requirements by IER, except for pytorch
For the GPU we use (RTX-A100 or A6000), it requires pytorch installation of CUDA11, thus further reinstall pytorch by
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
We modify the train.py
mainly to remove wandb usage and save the model every epoch to disk. Further to work with the latest pytorch version, we change the .view()
function in utils
to .reshape()
.
training the model follow the original implementation (e.g. for FC100)
python3 train.py --model resnet12 --model_path save --dataset FC100 --data_root data --n_aug_support_samples 5 --n_ways 5 --n_shots 1 --epochs 65 --lr_decay_epochs 60 --gamma 1.0 --contrast_temp 1.0 --mvavg_rate 0.99 --memfeature_size 64 --batch_size 64 --tags FC100,INV_EQ
We use the same environment as IER. To make the code run, we have to make a few modifications and corrections, i.e, in orginal train_distillation.py
# inputs_all = torch.cat((x, x_180, x_90, x_270),0)
inputs_all = torch.cat((x, x_90),0)
# (_,_,_,_, feat_s_all), (logit_s_all, rot_s_all) = model_s(inputs_all[:4*batch_size], rot=True)
(_,_,_,_, feat_s_all), (logit_s_all, rot_s_all) = model_s(inputs_all[:2*batch_size], rot=True)
# loss = loss_div + opt.gamma*loss_a / 3
loss = loss_div + opt.gamma*loss_a
We found that results vary (and sometimes a lot) across different runs. To try to get models matching results for what reported in these papers, we found that it's important to have multiple runs of the initial training (generation 0 in IER and SKD) and pick the best model to start with, for distillation (generation 1).
Please consider citing the paper of SKD and IER
@InProceedings{Rizve_2021_CVPR,
author = {Rizve, Mamshad Nayeem and Khan, Salman and Khan, Fahad Shahbaz and Shah, Mubarak},
title = {Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {10836-10846}
}
@article{rajasegaran2020self,
title={Self-supervised Knowledge Distillation for Few-shot Learning},
author={Rajasegaran, Jathushan and Khan, Salman and Hayat, Munawar and Khan, Fahad Shahbaz and Shah, Mubarak},
journal={https://arxiv.org/abs/2006.09785},
year = {2020}
}
It would be also nice if you consider reading our latest work on FSL :)
@inproceedings{jian2022label,
title={Label hallucination for few-shot classification},
author={Jian, Yiren and Torresani, Lorenzo},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={36},
number={6},
pages={7005--7014},
year={2022}
}