OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features

This repo is the implementation of the following paper:

OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features
Anton Osokin, Denis Sumin, Vasily Lomakin
In proceedings of the European Conference on Computer Vision (ECCV), 2020

If you use our ideas, code or data, please, cite our paper (available on arXiv).

Citation in bibtex

@inproceedings{osokin20os2d,
    title = {{OS2D}: One-Stage One-Shot Object Detection by Matching Anchor Features},
    author = {Anton Osokin and Denis Sumin and Vasily Lomakin},
    booktitle = {proceedings of the European Conference on Computer Vision (ECCV)},
    year = {2020} }

License

This software is released under the MIT license, which means that you can use the code in any way you want.

Requirements

python >= 3.7
pytorch >= 1.4, torchvision >=0.5
NVIDIA GPU, tested with V100 and GTX 1080 Ti
Installed CUDA, tested with v10.0

See INSTALL.md for the package installation.

Demo

See our demo-notebook for an illustration of our method.

Dataset installation

Grozi-3.2k dataset with our annotation (0.5GB): download from Google Drive or with the magic command and unpack to $OS2D_ROOT/data

cd $OS2D_ROOT
./os2d/utils/wget_gdrive.sh data/grozi.zip 1Fx9lvmjthe3aOqjvKc6MJpMuLF22I1Hp
unzip data/grozi.zip -d data

Extra test sets of retail products (0.1GB): download from Google Drive or with the magic command and unpack to $OS2D_ROOT/data

cd $OS2D_ROOT
./os2d/utils/wget_gdrive.sh data/retail_test_sets.zip 1Vp8sm9zBOdshYvND9EPuYIu0O9Yo346J
unzip data/retail_test_sets.zip -d data

INSTRE datasets (2.3GB) are re-hosted in Center for Machine Perception in Prague (thanks to Ahmet Iscen!):

cd $OS2D_ROOT
wget ftp://ftp.irisa.fr/local/texmex/corpus/instre/gnd_instre.mat -P data/instre  # 200KB
wget ftp://ftp.irisa.fr/local/texmex/corpus/instre/instre.tar.gz -P data/instre  # 2.3GB
tar -xzf data/instre/instre.tar.gz -C data/instre

If you want to add your own dataset you should create an instance of the DatasetOneShotDetection class and then pass it into the functions creating dataloaders build_train_dataloader_from_config or build_eval_dataloaders_from_cfg from os2d/data/dataloader.py. See os2d/data/dataset.py for docs and examples.

Trained models

We release three pretrained models:

Name	mAP on "grozi-val-new-cl"	link
OS2D V2-train	90.65	Google Drive
OS2D V1-train	88.71	Google Drive
OS2D V2-init	86.07	Google Drive

The results (mAP on "grozi-val-new-cl") can be computed with the commands given below.

You can download the released datasets with the magic commands:

cd $OS2D_ROOT
./os2d/utils/wget_gdrive.sh models/os2d_v2-train.pth 1l_aanrxHj14d_QkCpein8wFmainNAzo8
./os2d/utils/wget_gdrive.sh models/os2d_v1-train.pth 1ByDRHMt1x5Ghvy7YTYmQjmus9bQkvJ8g
./os2d/utils/wget_gdrive.sh models/os2d_v2-init.pth 1sr9UX45kiEcmBeKHdlX7rZTSA4Mgt0A7

Evaluation

OS2D V2-train (best model)

For a fast eval on a validation set, one can do use a single scale of images with this script (will give 85.58 mAP on the validation set "grozi-val-new-cl"):

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-train.pth eval.scales_of_image_pyramid "[1.0]"

Multiscale evaluation gives better results - scripts below use the default setting with 7 scales: 0.5, 0.625, 0.8, 1, 1.2, 1.4, 1.6. Note that this evaluation can be slower because of the multiple scale and a lot of classes in the dataset.

To evaluate on the validation set with multiple scales, run:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-train.pth

OS2D V1-train

To evaluate on the validation set run:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model False model.use_simplified_affine_model True model.backbone_arch ResNet101 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v1-train.pth

OS2D V2-init

To evaluate on the validation set run:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-init.pth

Training

Pretrained models

In this project, we do not train models from scratch but start from some pretrained models. For instructions how to get them, see models/README.md.

Best models

Our V2-train model on the Grozi-3.2k dataset was trained using this command:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-renamed.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining True output.path output/os2d_v2-train

Dut to hard patch mining, this process is quite slow. Without it, training is faster, but produces slightly worse results:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-renamed.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining False output.path output/os2d_v2-train-nomining

For the V1-train model, we used this command:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model False model.use_simplified_affine_model True train.objective.loc_weight 0.2 train.model.freeze_bn_transform False model.backbone_arch ResNet101 init.model models/gl18-tl-resnet101-gem-w-a4d43db-converted.pth train.mining.do_mining False output.path output/os2d_v1-train

Note that these runs need a lot of RAM due to caching of the whole training set. If this does not work for you you can use parameters train.cache_images False, which will load images on the fly, but can be slow. Also note that several first iterations of training can be slow bacause of "warming up", i.e., computing the grids of anchors in Os2dBoxCoder. Those computations are cached, so everyhitng will eventually run faster.

For the rest of the training scripts see below.

Rerunning experiments

All the experiments ob this project were run with our job helper. For each experiment, one program an experiment structure (in python) and calls several technical function provided by the launcher. See, e.g., this file for an example.

The launch happens as follows:

# add OS2D_ROOT to the python path - can be done, e.g., as follows
export PYTHONPATH=$OS2D_ROOT:$PYTHONPATH
# call the experiment script
python ./experiments/launcher_exp1.py LIST_OF_LAUNCHER_FLAGS

Extra parameters in LIST_OF_LAUNCHER_FLAGS are parsed by the launcher and contain some useful options about the launch:

--no-launch allows to prepare all the scripts of the experiment without the actual launch.
--slurm allows to prepare SLURM jobs and launches (if the is no --no-launch) with sbatch.
--stdout-file and --stderr-file - files where to save stdout and stderr, respectively (relative to the log_path defined in the experiment description).
For many SLURM related parameters, see the launcher.

Our experiments can be found here:

Baselines

We have added two baselines in this repo:

Class-agnostic detector + image retrieval system: see README for details.
Co-Attention and Co-Excitation, CoAE (original code, paper): see README for details.

Acknowledgements

We would like to personally thank Ignacio Rocco, Relja Arandjelović, Andrei Bursuc, Irina Saparina and Ekaterina Glazkova for amazing discussions and insightful comments without which this project would not be possible.

This research was partly supported by Samsung Research, Samsung Electronics, by the Russian Science Foundation grant 19-71-00082 and through computational resources of HPC facilities at NRU HSE.

This software was largely inspired by a number of great repos: weakalign, cnnimageretrieval-pytorch, torchcv, maskrcnn-benchmark. Special thanks goes to the amazing PyTorch.

macmorn/os2d