LOST

Pytorch implementation of the unsupervised object discovery method LOST. More details can be found in the paper:

Localizing Objects with Self-Supervised Transformers and no Labels, BMVC 2021 [arXiv]
by Oriane Siméoni, Gilles Puy, Huy V. Vo, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Renaud Marlet and Jean Ponce

We refer to the work from the following citation:

@inproceedings{LOST,
   title = {Localizing Objects with Self-Supervised Transformers and no Labels},
   author = {Oriane Sim\'eoni and Gilles Puy and Huy V. Vo and Simon Roburin and Spyros Gidaris and Andrei Bursuc and Patrick P\'erez and Renaud Marlet and Jean Ponce},
   journal = {Proceedings of the British Machine Vision Conference (BMVC)},
   month = {November},
   year = {2021}
}

Installation of LOST

Repo Initialization

git clone --recursive https://github.com/YuYue525/AI6103_project.git

Dependencies

This code was implemented with python 3.7, PyTorch 1.7.1 and CUDA 10.2. Please install PyTorch. In order to install the additionnal dependencies, please launch the following command:

pip install -r requirements.txt

Install DINO

This method is based on DINO paper. The framework can be installed using the following commands:

cd dino; touch __init__.py
echo -e "import sys\nfrom os.path import dirname, join\nsys.path.insert(0, join(dirname(__file__), '.'))" >> __init__.py; cd ../;

The code was made using the commit ba9edd1 of DINO repo (please rebase if breakage).

Launching LOST on datasets

Following are the different steps to reproduce the results of LOST presented in the paper.

PASCAL-VOC

Please download the PASCAL VOC07 and PASCAL VOC12 datasets (link) and put the data in the folder datasets. There should be the two subfolders: datasets/VOC2007 and datasets/VOC2012. In order to apply lost and compute corloc results (VOC07 61.9, VOC12 64.0), please launch:

python main.py --dataset VOC07 --set trainval
python main.py --dataset VOC12 --set trainval

COCO

Please download the COCO dataset and put the data in datasets/COCO. Results are provided given the 2014 annotations following previous works. The following command line allows you to get results on the subset of 20k images of the COCO dataset (corloc 50.7), following previous litterature. To be noted that the 20k images are a subset of the train set.

python main.py --dataset COCO20k --set train

Different models

We have tested the method on different setups of the VIT model, corloc results are presented in the following table (more can be found in the paper).

arch	pre-training	dataset
arch	pre-training	VOC07	VOC12	COCO20k
ViT-S/16	DINO	61.5	64.1	50.7
ViT-S/8	DINO	55.3	57.0	49.8
ViT-B/16	DINO	60.0	63.3	50.0
ResNet50	DINO	36.8	42.7	26.5
ResNet50	Imagenet	33.8	39.1	25.5
VGG16	Imagenet	41.4	47.2	30.2

However, when measuring the distance among features obtained, the original paper directly computes the dot products of feature pairs without normalization. In our implementation, we also use other measurement like cosine similarity to measure the patch similarity. The following table shows the results:

arch	pre-training	dataset
		VOC07		VOC12		COCO20k
		dot product	cosine sim	dot product	cosine sim	dot product	cosine sim
ViT-S/16	DINO	61.5	61.7	64.1	64.3	50.7	50.7
ViT-S/8	DINO	55.3	55.3	57.0	57.2	49.8	49.9
ViT-B/16	DINO	60.0	60.1	63.3	63.4	50.0	50.0
ResNet50	DINO	36.8	36.5	42.7	42.5	26.5	26.4
ResNet50	Imagenet	33.8	33.6	39.1	39.0	25.5	25.4
VGG16	Imagenet	41.4	41.6	47.2	47.0	30.2	30.1

In our implementation, we also tried Pearson product-moment correlation coefficient (PCCs) to measure the patch similarity. The following table shows the results:

arch	pre-training	dataset
		VOC07		VOC12		COCO20k
		dot product	PCCs	dot product	PCCs	dot product	PCCs
ViT-S/16	DINO	61.5	61.6	64.1	64.1	50.7	50.6
ViT-S/8	DINO	55.3	55.0	57.0	57.1	49.8	49.8
ViT-B/16	DINO	60.0	60.3	63.3	63.4	50.0	50.3
ResNet50	DINO	36.8	30.8	42.7	35.9	26.5	26.5
ResNet50	Imagenet	33.8	31.1	39.1	36.2	25.5	25.5

SoLOST

In our implementation, we also proposed the improved method Similarity-orientated LOST (SoLOST), the following table shows the improvement:

arch	pre-training	dataset
		VOC07		VOC12		COCO20k
		LOST	SoLOST (60% potencials)	LOST	SoLOST (60% potencials)	LOST	SoLOST (50% potencials)
ViT-S/16	DINO	61.5	62.2	64.1	64.8	50.7	52.2
ViT-S/8	DINO	55.3	59.8	57.0	64.3	49.8	51.1
ViT-B/16	DINO	60.0	61.8	63.3	64.4	50.0	52.0
ResNet50	DINO	36.8	34.6	42.7	39.6	26.5	24.8
ResNet50	Imagenet	33.8	32.6	39.1	37.9	25.5	24.7
VGG16	Imagenet	41.4	41.9	47.2	48.9	30.2	30.7

Previous results on the dataset VOC07 can be obtained by launching the following commands. Visualize the predictions (pred), the maps of the Figure 2 in the paper (fms) and the visulization of the seed expansion (seed_expansion). Box predictions are also stored in the output directory given by parameter output_dir.

python main.py --dataset VOC07 --set trainval #VIT-S/16
python main.py --dataset VOC07 --set trainval --patch_size 8 #VIT-S/8
python main.py --dataset VOC07 --set trainval --arch vit_base #VIT-B/16
python main.py --dataset VOC07 --set trainval --arch resnet50 #Resnet50/DINO
python main.py --dataset VOC07 --set trainval --arch resnet50_imagenet #Resnet50/imagenet

SoLOST+CAD

In this work, we additionally use SoLOST predictions to train object detection models without any human supervision. We explore class-agnostic (CAD). The next section present the different steps to reproduce our results.

Installation for CAD training

We use the detectron2 framework to train a Faster R-CNN model with LOST predictions as pseudo-gt. The code was developped with the version v0.5 of the framework. In order to reproduce our results, please install detectron2 using the next commands. In case of failure, you can find the installation corresponding to your version of pytorch/CUDA https://github.com/facebookresearch/detectron2/releases.

git clone https://github.com/facebookresearch/detectron2.git
python -m pip install detectron2==0.5

Set global variables for ease of usage.

export LOST=$(pwd)
cd detectron2; export D2=$(pwd);

Then please copy LOST-specific files to detectron2 framework, following:

ln -s $LOST/tools/*.py $D2/tools/. # Move LOST tools to D2
mkdir $D2/configs/LOST
ln -s $LOST/tools/configs/* $D2/configs/LOST/. # Move LOST configs to D2

Training a Class-Agnostic Detector (CAD) with LOST pseudo-annotations

Before launching a training, data must be formated to fit detectron2 and COCO styles. Following are the command lines to do this formatting for boxes predicted with LOST.

cd $D2;

# Format DINO weights to fit detectron2
wget https://dl.fbaipublicfiles.com/dino/dino_resnet50_pretrain/dino_resnet50_pretrain.pth -P ./data # Download the model from DINO
python tools/convert_pretrained_to_detectron_format.py --input ./data/dino_resnet50_pretrain.pth --output ./data/dino_RN50_pretrain_d2_format.pkl

# Format pseudo-boxes data to fit detectron2
python tools/prepare_voc_LOST_CAD_pseudo_boxes_in_detectron2_format.py --year 2007 --pboxes $LOST/data/LOST_predictions/LOST_VOC07.pkl

# Format VOC data to fit COCO style
python tools/prepare_voc_data_in_coco_style.py --is_CAD --voc07_dir $LOST/datasets/VOC2007 --voc12_dir $LOST/datasets/VOC2012

The next command line allows you to launch a CAD training with 1 gpu on the VOC2007 dataset. The batch size is set to 2. Please make sure to change the argument value MODEL.WEIGHTS to the correct path of DINO weights. VOC2012 have the same steps.

python tools/train_net_for_LOST_CAD.py --num-gpus 4 --config-file ./configs/LOST/RN50_DINO_FRCNN_VOC07_CAD.yaml DATALOADER.NUM_WORKERS 8 OUTPUT_DIR ./outputs/RN50_DINO_FRCNN_VOC07_CAD MODEL.WEIGHTS ./data/dino_RN50_pretrain_d2_format.pkl

Inference results of the model will be stored in $OUTPUT_DIR/inference. In order to produce results on the train+val dataset, please use the following command:

python tools/train_net_for_LOST_CAD.py --resume --eval-only --num-gpus 4 --config-file ./configs/LOST/RN50_DINO_FRCNN_VOC07_CAD.yaml DATALOADER.NUM_WORKERS 6 MODEL.WEIGHTS ./outputs/RN50_DINO_FRCNN_VOC07_CAD/model_final.pth OUTPUT_DIR ./outputs/RN50_DINO_FRCNN_VOC07_CAD/ DATASETS.TEST '("voc_2007_trainval_CAD_coco_style", )'
cd $LOST;
python main_corloc_evaluation.py --dataset VOC07 --set trainval --type_pred detectron --pred_file $D2/outputs/RN50_DINO_FRCNN_VOC07_CAD/inference/coco_instances_results.json

Training LOST+CAD on COCO20k dataset

Following are the command lines allowing to train a detector in a class-agnostic fashion on the COCO20k subset of COCO dataset.

cd $D2;

# Format pseudo-boxes data to fit detectron2
python tools/prepare_coco_LOST_CAD_pseudo_boxes_in_detectron2_format.py --pboxes $LOST/outputs/COCO20k_train/LOST-vit_small16_k/preds.pkl

# Generate COCO20k CAD gt annotations
python tools/prepare_coco_CAD_gt.py --coco_dir $LOST/datasets/COCO

# Train detector (evaluation done on COCO20k CAD training set)
python tools/train_net_for_LOST_CAD.py --num-gpus 4 --config-file ./configs/LOST/RN50_DINO_FRCNN_COCO20k_CAD.yaml DATALOADER.NUM_WORKERS 8 OUTPUT_DIR ./outputs/RN50_DINO_FRCNN_COCO20k_CAD MODEL.WEIGHTS ./data/dino_RN50_pretrain_d2_format.pkl

# Corloc evaluation
python main_corloc_evaluation.py --dataset COCO20k --type_pred detectron --pred_file $D2/outputs/RN50_DINO_FRCNN_COCO20k_CAD/inference/coco_instances_results.json

Evaluating LOST+CAD (corloc results)

We have provided predictions of a class-agnostic Faster R-CNN model trained using LOST boxes as pseudo-gt; they are stored in the folder data/CAD_predictions. In order to launch the corloc evaluation, please launch the following scripts. It is to be noted that in this evaluation, only the box with the highest confidence score is considered per image.

python main_corloc_evaluation.py --dataset VOC07 --set trainval --type_pred detectron --pred_file data/CAD_predictions/LOST_plus_CAD_VOC07.json
python main_corloc_evaluation.py --dataset VOC12 --set trainval --type_pred detectron --pred_file data/CAD_predictions/LOST_plus_CAD_VOC12.json
python main_corloc_evaluation.py --dataset COCO20k --set train --type_pred detectron --pred_file data/CAD_predictions/LOST_plus_CAD_COCO20k.json

The following table presents the obtained corloc results.

method	dataset
method	VOC07	VOC12	COCO20k
LOST+CAD	60.7	67.8	53.3
SoLOST+CAD	61.2	67.1	54.8

The following table presents the obtained AP50 results.

	VOC07		VOC12	COCO20k
Training set (when applicable)	trainval	trainval	trainval	trainval
Evaluation set	test	trainval	trainval	trainval
LOST + CAD	23.4	23.7	30.7	8.8
SoLOST + CAD	25.7	25.7	31.0	9.1

Training details

We use the R50-C4 model of Detectron2 with ResNet50 pre-trained with DINO self-supervision model.

Details:

1\mini-batches of size 2 across 1 GPU using BatchNorm

2\extra BatchNorm layer for the RoI head after conv5, i.e., Res5ROIHeadsExtraNorm layer in Detectron2

3\frozen first two convolutional blocks of ResNet-50, i.e., conv1 and conv2 in Detectron2

4\learning rate is first warmed-up for 100 steps to 0.02 and then reduced by a factor of 10 after 18K and 22K training steps

5\we use in total 24K training steps for all the experiments, except when training class-agnostic detectors on the pseudo-boxes of the VOC07 trainval set, in which case we use 10K steps.

6\The training details and other results about SoLOST+CAD are in CAD and LOST_CAD.

YuYue525/SoLOST