/unic

PyTorch code and pretrained weights for the UNIC models.

Primary LanguagePythonOtherNOASSERTION

UNIC: Universal Classification Models via Multi-teacher Distillation

Mert Bulent Sariyildiz · Philippe Weinzaepfel · Thomas Lucas · Diane Larlus · Yannis Kalantidis

NAVER LABS Europe

ECCV 2024

[Paper] · [Citation]

Model Diagram

Installation

For training UNIC models on ImageNet-1K (by distilling from the four teachers we used in the paper), you need some Python packages, pretrained weights for the teacher models and the ImageNet-1K dataset.

Conda environment

  • Create a conda environment with all the necessary packages for training and evaluation:
env_name="unic"
conda create -n ${env_name}
conda activate ${env_name}
conda install pytorch=2.1.1 pytorch-cuda=12.1 torchvision \
    timm transformers einops torchmetrics optuna \
    tensorboard matplotlib pandas scikit-learn-intelex omegaconf \
    -c pytorch -c nvidia -c conda-forge
  • Set the path of your conda in scripts/setup_env.sh, i.e. update the conda_dir variable. Then your environment will be automatically used by both the training and evaluation scripts.

Teacher models

(cd scripts/teachers && ./_prepare_all.sh <path_to_download_directory>)
  • Once teacher checkpoints are downloaded, update the TEACHER_CFG variable in teachers/config.py to point to the correct paths.

Distillation dataset

  • Download the ImageNet-1K dataset (ILSVRC-2012). Check out the official website for details.

Training UNIC models

  • Use the main_unic.py script to train UNIC models. By default, it distills the following four teachers into a ViT-Base/16 student:
    • DINO (dino_vitbase_16)
    • DeiT-III (deit3_vitbase_16)
    • iBOT (ibot_vitbase_16)
    • dBOT fine-tuned on ImageNet-1K classification (dbotft_vitbase_16)

So make sure to download the teacher models (see the Teacher models section).

The architecture of the student encoder is compatible with DINOv2.

We trained our UNIC models on 4 GPUs, with minimum 32GB of memory per GPU. The default batch size is 128 per GPU, adjust it according to your GPU memory (learning rate will be scaled accordingly).

# - Initialize the conda environment
# - Set ${MASTER_ADDR}, ${MASTER_PORT}, ${N_GPUS} for distributed training
source ./scripts/setup_env.sh

dataset_dir="/path/to/imagenet-1k"
output_dir="/path/to/output_dir"
mkdir -p ${output_dir}

torchrun --rdzv-backend=c10d --rdzv-endpoint=localhost:0 --nnodes=1 --nproc_per_node=${N_GPUS} main_unic.py  \
    --data_dir=${dataset_dir} \
    --output_dir=${output_dir} \
    --seed=${RANDOM}

Pretrained models

Distilled from teachers pretrained on ImageNet-1K

We provide a pretrained UNIC model with the ViT-Base/16 architecture, distilled from the four teachers mentioned above.

Model Teachers Distillation
Dataset
Distillation
Resolution
Student
Architecture
ImageNet‑1K
Classification
ADE20K
Segmentation
Model
Checkpoint
Training
Arguments
UNIC DINO‑B/16
iBOT‑B/16
DeiT‑III‑B/16
dBOT‑ft‑B/16
ImageNet‑1K 224 ViT‑Base/16 83.8 39.6
(Linear head link)
Link
(870MB)
Link

The relative performance of UNIC over the four teachers is shown below.

Relative performance of UNIC over teachers

Distilled from teachers pretrained on arbitrary datasets

We also provide a pretrained UNIC-L model with the ViT-Large/14 architecture distilled from DINOv2-G/14 and MetaCLIP-H/14 teachers.

Model Teachers Distillation
Dataset
Distillation
Resolution
Student
Architecture
ImageNet‑1K
k-NN (k=20)
ImageNet‑1K
Zero‑shot
ADE20K
Segmentation
Model
Checkpoint
Training
Arguments
UNIC‑L DINOv2‑G/14
MetaCLIP‑H/14
ImageNet‑1K 224/336 ViT‑Large/14 85.6 81.4 48.3
(Linear head link)
Link
(2.2GB)
Link

Comparison of UNIC-L to the teachers and recent AM-RADIO model is shown below.

Relative performance of UNIC-L over teachers

Evaluating UNIC models

Transfer learning tasks

The evaluation protocol for transfer learning tasks involves two steps:

  • Extracting features from the encoder of a pretrained UNIC model
  • Training logistic regression classifiers on top of the extracted features

We use the implementation from t-ReX, which is available at https://github.com/naver/trex. For convenience, the evaluation code is copied in the eval_transfer folder of this repository.

First, download the transfer datasets following the instructions in t-ReX repository. Once download finishes, update the hardcoded dataset paths in the eval_transfer/data/init.py file. Then, use the following command to evaluate a pretrained UNIC model on, e.g. the ImageNet-1K dataset (with labels):

source scripts/setup_env.sh

##########
# extract features
dataset="in1k"
image_size=224
pretrained="/path/to/unic/checkpoint.pth"

features_dir=$(dirname "${pretrained}")
features_dir=${features_dir}/transfer/features_${dataset}_${image_size}

if [ ! -f "${features_dir}/features_trainval.pth" ] || [ ! -f "${features_dir}/features_test.pth" ]; then
    echo "Extracting features..."
    python eval_transfer/main_ft_extract.py \
        --output_dir="${features_dir}" \
        --pretrained="${pretrained}" \
        --dataset="${dataset}" \
        --image_size="${image_size}"
fi

##########
# train logreg classifier using extracted features
features_norm="none"
clf_type="logreg_sklearn"
if [[ "${dataset}" == "in1k" ]] || [[ "${dataset}" == cog_* ]] || [[ "${dataset}" == inat* ]]; then
    # for large datasets,
    # we use SGD implemented in PyTorch and l2 normalize features
    features_norm="l2"
    clf_type="logreg_torch"
fi

echo ""
echo "Training classifier ..."
python -m sklearnex eval_transfer/main_clf.py --features_dir="${features_dir}" --features_norm=${features_norm} --clf_type=${clf_type}

See the --dataset argument in main_ft_extact.py for the list of available datasets.

Dense prediction tasks

Semantic segmentation on ADE20K

First, download the ADE20K dataset from the official website.

We follow the evaluation protocol from DINOv2, which requires some extra packages like mmcv with specific versions. You can install them using the commands below:

pip install openmim
mim install "mmcv-full==1.7.2"
mim install "mmengine==0.10.1"
pip install "mmsegmentation==0.30.0"
pip install ftfy

If you encounter any mismatch between package versions, we recommend creating a new conda environment as mentioned in the DINOv2 repository.

Then, use the following command to evaluate a pretrained UNIC model on the ADE20K semantic segmentation task (default hyper-parameters are set for 1 GPU):

source ./scripts/setup_env.sh

data_dir=/path/to/ADEChallengeData2016
pretrained="/path/to/unic/checkpoint.pth"

python eval_dense/eval_semseg.py --data_dir=${data_dir} --pretrained=${pretrained}

Citation

If you find this repository useful, please consider citing us:

@inproceedings{sariyildiz2024unic,
    title={{UNIC}: Universal Classification Models via Multi-teacher Distillation},
    author={Sariyildiz, Mert Bulent and Weinzaepfel, Philippe and Lucas, Thomas and Larlus, Diane and Kalantidis, Yannis},
    booktitle={European Conference on Computer Vision (ECCV)},
    year={2024},
}