/FORMULA

[WACV 2023] Foreground Guidance and Multi-Layer Feature Fusion for Unsupervised Object Discovery with Transformers

Primary LanguagePythonOtherNOASSERTION

Foreground Guidance and Multi-Layer Feature Fusion for Unsupervised Object Discovery with Transformers

This is the official implementation of the Foreground Guidance and Multi-Layer Feature Fusion for Unsupervised Object Discovery with Transformers (WACV2023)

pipeline

Preparation

Step 1. Please install PyTorch.

Step 2. To install other dependencies, please launch the following command:

pip install -r requirements.txt

Data Preparation

PASCAL-VOC

Please download the PASCAL VOC07 and PASCAL VOC12 datasets (link) and put the data in the folder datasets.

COCO

Please download the COCO dataset and put the data in datasets/COCO. We use COCO20k (a subset of COCO train2014) following previous works.

The structure of the datasets folder will be like:

├── datasets
│  ├── VOCdevkit
│  │  ├── VOC2007
│  │  │  ├──ImageSets & Annotations & ...
│  │  ├── VOC2012
│  │  │  ├──ImageSets & Annotations & ...
|  ├── COCO
│  │  ├── annotations
│  │  ├── images
│  │  │  ├──train2014 & ...

Singel Object Discovery

Following the steps to get the results presented in the paper.

FORMULA-L

# for voc
python main_formula_LOST.py --dataset VOC07 --set trainval
python main_formula_LOST.py --dataset VOC12 --set trainval

# for coco
python main_formula_LOST.py --dataset COCO20k --set train

FORMULA-TC

# for voc
python main_formula_TokenCut.py --dataset VOC07 --set trainval --arch vit_base
python main_formula_TokenCut.py --dataset VOC12 --set trainval --arch vit_base

# for coco
python main_formula_TokenCut.py --dataset COCO20k --set train --arch vit_base

Results

The results of this repository:

Method arch VOC07 VOC12 COCO_20k
FORMULA-L ViT-S/16 64.28 67.65 54.04
FORMULA-TC ViT-B/16 69.13 73.08 59.57

CAD and OD trainings

Please following LOST to conduct the experiments of CAD and OD.

License

The project is only free for academic research purposes, but needs authorization for commerce. For commerce permission, please contact wyt@pku.edu.cn.

Citation

If you use our code/model/data, please cite our paper:

@InProceedings{Zhiwei_2023_WACV,
    author    = {Zhiwei Lin and Zengyu Yang and Yongtao Wang},
    title     = {Foreground Guidance and Multi-Layer Feature Fusion for Unsupervised Object Discovery with Transformers},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year      = {2023}
}

Acknowledgement

FORMULA is built on top of LOST, DINO and TokenCut. We sincerely thanks those authors for their great works and codes.