This repo is the implementation of the following paper:

OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features
Anton Osokin, Denis Sumin, Vasily Lomakin
In proceedings of the European Conference on Computer Vision (ECCV), 2020

If you use our ideas, code or data, please, cite our paper (available on arXiv).

This software is released under the MIT license, which means that you can use the code in any way you want.


  1. python >= 3.7
  2. pytorch >= 1.4, torchvision >=0.5
  3. NVIDIA GPU, tested with V100 and GTX 1080 Ti
  4. Installed CUDA, tested with v10.0

See INSTALL.md for the package installation.


See our demo-notebook for an illustration of our method.

Dataset installation

  1. Grozi-3.2k dataset with our annotation (0.5GB): download from Google Drive or with the magic command and unpack to $OS2D_ROOT/data
./os2d/utils/wget_gdrive.sh data/grozi.zip 1Fx9lvmjthe3aOqjvKc6MJpMuLF22I1Hp
unzip data/grozi.zip -d data
  1. Extra test sets of retail products (0.1GB): download from Google Drive or with the magic command and unpack to $OS2D_ROOT/data
./os2d/utils/wget_gdrive.sh data/retail_test_sets.zip 1Vp8sm9zBOdshYvND9EPuYIu0O9Yo346J
unzip data/retail_test_sets.zip -d data
  1. INSTRE datasets (2.3GB) are re-hosted in Center for Machine Perception in Prague (thanks to Ahmet Iscen!):
wget ftp://ftp.irisa.fr/local/texmex/corpus/instre/gnd_instre.mat -P data/instre  # 200KB
wget ftp://ftp.irisa.fr/local/texmex/corpus/instre/instre.tar.gz -P data/instre  # 2.3GB
tar -xzf data/instre/instre.tar.gz -C data/instre
  1. If you want to add your own dataset you should create an instance of the DatasetOneShotDetection class and then pass it into the functions creating dataloaders build_train_dataloader_from_config or build_eval_dataloaders_from_cfg from os2d/data/dataloader.py. See os2d/data/dataset.py for docs and examples.

Trained models

We release three pretrained models:

Name mAP on "grozi-val-new-cl" link
OS2D V2-train 90.65 Google Drive
OS2D V1-train 88.71 Google Drive
OS2D V2-init 86.07 Google Drive

The results (mAP on "grozi-val-new-cl") can be computed with the commands given below.

You can download the released datasets with the magic commands:

./os2d/utils/wget_gdrive.sh models/os2d_v2-train.pth 1l_aanrxHj14d_QkCpein8wFmainNAzo8
./os2d/utils/wget_gdrive.sh models/os2d_v1-train.pth 1ByDRHMt1x5Ghvy7YTYmQjmus9bQkvJ8g
./os2d/utils/wget_gdrive.sh models/os2d_v2-init.pth 1sr9UX45kiEcmBeKHdlX7rZTSA4Mgt0A7


  1. OS2D V2-train (best model)

For a fast eval on a validation set, one can do use a single scale of images with this script (will give 85.58 mAP on the validation set "grozi-val-new-cl"):

python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-train.pth eval.scales_of_image_pyramid "[1.0]"

Multiscale evaluation gives better results - scripts below use the default setting with 7 scales: 0.5, 0.625, 0.8, 1, 1.2, 1.4, 1.6. Note that this evaluation can be slower because of the multiple scale and a lot of classes in the dataset.

To evaluate on the validation set with multiple scales, run:

python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-train.pth
  1. OS2D V1-train

To evaluate on the validation set run:

python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model False model.use_simplified_affine_model True model.backbone_arch ResNet101 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v1-train.pth
  1. OS2D V2-init

To evaluate on the validation set run:

python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-init.pth


Pretrained models

In this project, we do not train models from scratch but start from some pretrained models. For instructions how to get them, see models/README.md.

Best models

Our V2-train model on the Grozi-3.2k dataset was trained using this command:

python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-renamed.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining True output.path output/os2d_v2-train

Dut to hard patch mining, this process is quite slow. Without it, training is faster, but produces slightly worse results:

python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-renamed.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining False output.path output/os2d_v2-train-nomining

For the V1-train model, we used this command:

python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model False model.use_simplified_affine_model True train.objective.loc_weight 0.2 train.model.freeze_bn_transform False model.backbone_arch ResNet101 init.model models/gl18-tl-resnet101-gem-w-a4d43db-converted.pth train.mining.do_mining False output.path output/os2d_v1-train

Note that these runs need a lot of RAM due to caching of the whole training set. If this does not work for you you can use parameters train.cache_images False, which will load images on the fly, but can be slow. Also note that several first iterations of training can be slow bacause of "warming up", i.e., computing the grids of anchors in Os2dBoxCoder. Those computations are cached, so everyhitng will eventually run faster.

For the rest of the training scripts see below.

Rerunning experiments

All the experiments ob this project were run with our job helper. For each experiment, one program an experiment structure (in python) and calls several technical function provided by the launcher. See, e.g., this file for an example.

The launch happens as follows:

# add OS2D_ROOT to the python path - can be done, e.g., as follows
# call the experiment script
python ./experiments/launcher_exp1.py LIST_OF_LAUNCHER_FLAGS

Extra parameters in LIST_OF_LAUNCHER_FLAGS are parsed by the launcher and contain some useful options about the launch:

  1. --no-launch allows to prepare all the scripts of the experiment without the actual launch.
  2. --slurm allows to prepare SLURM jobs and launches (if the is no --no-launch) with sbatch.
  3. --stdout-file and --stderr-file - files where to save stdout and stderr, respectively (relative to the log_path defined in the experiment description).
  4. For many SLURM related parameters, see the launcher.

Our experiments can be found here:

  1. Experiments with OS2D
  2. Experiments with the detector-retrieval baseline
  3. Experiments with the CoAE baseline
  4. Experiments on the ImageNet dataset


We have added two baselines in this repo:

  1. Class-agnostic detector + image retrieval system: see README for details.
  2. Co-Attention and Co-Excitation, CoAE (original code, paper): see README for details.


We would like to personally thank Ignacio Rocco, Relja Arandjelović, Andrei Bursuc, Irina Saparina and Ekaterina Glazkova for amazing discussions and insightful comments without which this project would not be possible.

This research was partly supported by Samsung Research, Samsung Electronics, by the Russian Science Foundation grant 19-71-00082 and through computational resources of HPC facilities at NRU HSE.

This software was largely inspired by a number of great repos: weakalign, cnnimageretrieval-pytorch, torchcv, maskrcnn-benchmark. Special thanks goes to the amazing PyTorch.