/spot

[CVPR 2024 Highlight] :dog: SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Primary LanguagePythonMIT LicenseMIT

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers


[CVPR 2024 Highlight][paper][arXiv]

SPOT illustration

Contents


Installation

conda create -n spot python=3.9.16

conda activate spot

pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117

All experiments run on a single GPU.


COCO

Dataset Preparation

Download COCO dataset (2017 Train images,2017 Val images,2017 Train/Val annotations) from here and place them following this structure:

COCO2017
   ├── annotations
   │    ├── instances_train2017.json
   │    ├── instances_val2017.json
   │    └── ...
   ├── train2017
   │    ├── 000000000009.jpg
   │    ├── ...
   │    └── 000000581929.jpg
   └── val2017
        ├── 000000000139.jpg
        ├── ...
        └── 000000581781.jpg

Training SPOT (two-stage approach)

Stage 1: Train SPOT Teacher for 50 epochs on COCO:

python train_spot.py --dataset coco --data_path /path/to/COCO2017 --epochs 50 --num_slots 7 --train_permutations random --eval_permutations standard --log_path /path/to/logs/spot_teacher_coco

You can monitor the training progress with tensorboard:

tensorboard --logdir /path/to/logs/spot_teacher_coco --port=16000

Stage 2: Train SPOT (student) for 50 epochs on COCO (this produces the final SPOT model):

python train_spot_2.py --dataset coco --data_path /path/to/COCO2017 --epochs 50 --num_slots 7 --train_permutations random --eval_permutations standard --teacher_train_permutations random --teacher_eval_permutations random --teacher_checkpoint_path /path/to/logs/spot_teacher_coco/TIMESTAMP/checkpoint.pt.tar --log_path /path/to/logs/spot_coco

If you are interested in MAE encoder, download pre-trained weights ViT-Base from here and add:

--which_encoder mae_vitb16 --pretrained_encoder_weights mae_pretrain_vit_base.pth --lr_main 0.0002 --lr_min 0.00004

Pretrained model

Download pretrained SPOT model on COCO.

mBOi mBOc Download
34.9 44.8 Checkpoint

Evaluation

Evaluate SPOT on COCO aggregating all sequence permutations:

python eval_spot.py --dataset coco --data_path /path/to/COCO2017 --num_slots 7 --eval_permutations all --checkpoint_path /path/to/logs/spot_coco/TIMESTAMP/checkpoint.pt.tar

Training DINOSAUR baseline

You can also train the baseline experiment SPOT w/o self-training & w/o sequence permutation (this is essentially DINOSAUR re-implementation) for 100 epochs on COCO:

python train_spot.py --dataset coco --data_path /path/to/COCO2017 --epochs 100 --num_slots 7 --train_permutations standard --log_path /path/to/logs/spot_wost_wosq_coco

PASCAL VOC 2012

Dataset Preparation

Download PASCAL VOC 2012 dataset from http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar, extract the files and copy trainaug.txt in VOCdevkit/VOC2012/ImageSets/Segmentation. The final structure should be the following:

VOCdevkit
   └── VOC2012
          ├── ImageSets
          │      └── Segmentation
          │             ├── train.txt
          │             ├── trainaug.txt
          │             ├── trainval.txt
          │             └── val.txt
          ├── JPEGImages
          │      ├── 2007_000027.jpg
          │      ├── ...
          │      └── 2012_004331.jpg
          ├── SegmentationClass
          │      ├── 2007_000032.png
          │      ├── ...
          │      └── 2011_003271.png
          └── SegmentationObject
                 ├── 2007_000032.png
                 ├── ...
                 └── 2011_003271.png

Training SPOT (two-stage approach)

Stage 1: Train SPOT Teacher for 560 epochs on VOC:

python train_spot.py --dataset voc --data_path /path/to/VOCdevkit/VOC2012 --epochs 560 --num_slots 6 --train_permutations random --eval_permutations standard --log_path /path/to/logs/spot_teacher_voc

Stage 2: Train SPOT (student) for 560 epochs on VOC (this produces the final SPOT model):

python train_spot_2.py --dataset voc --data_path /path/to/VOCdevkit/VOC2012 --epochs 560 --num_slots 6 --train_permutations random --eval_permutations standard --teacher_train_permutations random --teacher_eval_permutations random --teacher_checkpoint_path /path/to/logs/spot_teacher_voc/TIMESTAMP/checkpoint.pt.tar --log_path /path/to/logs/spot_voc

Pretrained model

Download pretrained SPOT model on PASCAL VOC 2012.

mBOi mBOc Download
48.6 55.7 Checkpoint

Evaluation

Evaluate SPOT on VOC aggregating all sequence permutations:

python eval_spot.py --dataset voc --data_path /path/to/VOCdevkit/VOC2012 --num_slots 6 --eval_permutations all --checkpoint_path /path/to/logs/spot_voc/TIMESTAMP/checkpoint.pt.tar

MOVi-C/E

Dataset Preparation

To download MOVi-C/E datasets, uncomment the last two rows in requirements.txt to install the tensorflow_datasets package. Then, run the following commands:

python download_movi.py --level c --split train
python download_movi.py --level c --split validation
python download_movi.py --level e --split train
python download_movi.py --level e --split validation

The structure should be the following:

 MOVi
  ├── c
  │    ├── train
  │    │      ├── 00000000
  │    │      │       ├── 00000000_image.png
  │    │      │       └── ...
  │    │      └── ...
  │    └── validation
  │           ├── 00000000
  │           │       ├── 00000000_image.png
  │           │       ├── 00000000_mask_00.png
  │           │       └── ...
  │           └── ...
  └── e
       ├── train
       │      └── ...
       └── validation
              └── ...

Training SPOT (two-stage approach)

Stage 1: Train SPOT Teacher for 65 epochs on MOVi-C:

python train_spot.py --dataset movi --data_path /path/to/MOVi/c --epochs 65 --num_slots 11 --train_permutations random --eval_permutations standard --log_path /path/to/logs/spot_teacher_movic --val_mask_size 128 --lr_main 0.0002 --lr_min 0.00004

Stage 2: Train SPOT (student) for 30 epochs on MOVi-C (this produces the final SPOT model):

python train_spot_2.py --dataset movi --data_path /path/to/MOVi/c --epochs 30 --num_slots 11 --train_permutations random --eval_permutations standard --teacher_train_permutations random --teacher_eval_permutations random --teacher_checkpoint_path /path/to/logs/spot_teacher_movic/TIMESTAMP/checkpoint.pt.tar --log_path /path/to/logs/spot_movic --val_mask_size 128 --lr_main 0.0002 --lr_min 0.00015 --predefined_movi_json_paths train_movi_paths.json

Evaluation

Evaluate SPOT on MOVi-C aggregating all sequence permutations:

python eval_spot.py --dataset voc --dataset movi --data_path /path/to/MOVi/c --num_slots 11 --eval_permutations all --checkpoint_path /path/to/logs/spot_movic/TIMESTAMP/checkpoint.pt.tar

For MOVi-E experiments use --num_slots 24


License

This project is licensed under the MIT License.

Acknowledgement

This repository is built using the SLATE and OCLF repositories.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation:

@InProceedings{Kakogeorgiou2024SPOT,
    author    = {Kakogeorgiou, Ioannis and Gidaris, Spyros and Karantzalos, Konstantinos and Komodakis, Nikos},
    title     = {SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {22776-22786}
}