

  • Operating system: Testing has been performed on Ubuntu 20.04.
  • Python == 3.9
  • PyTorch == 1.12.0

Pretrained Models

The checkpoint and loss logs for the DINO pre-trained model are located here.

Training DINO

1.Prepare a CSV with a column named 'file_path' that includes the absolute paths of all images, and modify the following section accordingly

aimed_df = pd.read_csv("/home/abe/kuma-ssl/data/all_df.csv")

aimed_df = pd.read_csv("/home/abe/kuma-ssl/data/all_df.csv")
  1. training
cd dino
python -m torch.distributed.launch --nproc_per_node=2 --arch vit_small --batch_size_per_gpu 256

If you experience NaN values in the DINO loss, please set fp_16 to False, and also reduce the value of the gradient clipping.

Training MAE

1.Prepare a CSV with a column named 'file_path' that includes the absolute paths of all images, and modify the following section accordingly

aimed_df = pd.read_csv("/home/abe/kuma-ssl/data/all_df.csv")

aimed_df = pd.read_csv("/home/abe/kuma-ssl/data/all_df.csv")
  1. training
cd mae
python -m torch.distributed.launch --nproc_per_node=2 --arch vit_small --batch_size_per_gpu 256

get embedding from pretrained ViT→TSNE

ImageNet pretrained


DINO pretrained

dino/ --pretrained_weights [dino_vits_chechpoint.pth]

visualize attention map

dino/ --pretrained_weights [dino_vits_chechpoint.pth] --image_path [image path]

evaluate ViT-S


100% labels

python dino/

10% or 1% labels 5seed

python dino/ --rate [10 or 1]


100% labels

python dino/

10% or 1% labels 5seed

python dino/ --rate [10 or 1]