Companion code for L. Pineda, A. Salvador, et al.: Elucidating image-to-set prediction: An analysis of models, losses and datasets.
This repository contains a unified code-base to train and test strong image-to-set prediction (multi-label classification) baselines. The code comes with pre-defined train/valid/test splits for 5 datasets of increasing complexity (Pascal VOC 2007, MS COCO 2014, ADE20k, NUS-WIDE and Recipe1M) as well as a common evaluation protocol to compare all models. The top ranked baselines across datasets are released together with the code.
If you find this code useful in your research, please consider citing with the following BibTeX entry:
@article{PinedaSalvador2019im2set,
author = {Pineda, Luis and Salvador, Amaia and Drozdzal, Michal and Romero, Adriana},
title = {Elucidating image-to-set prediction: An analysis of models, losses and datasets},
journal = {CoRR},
volume = {abs/1904.05709},
year = {2019},
url = {https://arxiv.org/abs/1904.05709},
archivePrefix = {arXiv},
eprint = {1904.05709},
}
This code uses Python 3.7.3 (Anaconda), PyTorch 1.1.0. and cuda version 10.0.130.
- Installing pytorch:
$ conda install pytorch torchvision cuda90 -c pytorch
- Install dependencies
$ pip install -r requirements.txt
- Download VOC 2007 dataset and extract under
/path/to/voc/
. - Remember to also download the test set for evaluation.
- Fill in
configs/datapaths.json
with the path to voc dataset:"voc": "/path/to/voc/"
- Download MS COCO 2014 and extract under path
/path/to/coco/
. - Fill in
configs/datapaths.json
with the path to coco dataset:"coco": "/path/to/coco/"
- Download NUSWIDE dataset and extract under
/path/to/nuswide/
. - Fill in
configs/datapaths.json
with the path to nuswide dataset:"nuswide": "/path/to/nuswide/"
- Download ADE20K Challenge data and place under
/path/to/ade20k/
. - Fill in
configs/datapaths.json
with the path to ade20k dataset:"ade20k": "/path/to/ade20k/"
- Download Recipe1M (registration required) and extract under
/path/to/recipe1m/
. - The contents of
/path/to/recipe1m/
should be the following:
det_ingrs.json
layer1.json
layer2.json
images/
images/train
images/val
images/test
- Pre-process dataset and build vocabularies with:
$ python src/utils/recipe1m_utils.py --recipe1m_path path_to_recipe1m
Resulting files will be stored under /path/to/recipe1m/preprocessed
.
- Fill in
configs/datapaths.json
with the path to recipe1m dataset:"recipe1m": "/path/to/recipe1m/"
Note: all python calls below must be run from ./src
.
Checkpoints will be saved under a directory "<save_dir>/<dataset>/<model_name>/<image_model>/<experiment_name>/"
, specified by --save_dir
, --dataset
, --model_name
, --image_model
and --experiment_name
.
The recommended way to train the models reported in the paper is to use the JSON configuration files provided in
configs
folder. We have provided one configuration file for each combination of dataset, set predictor (model_name) and image backbone (image_model). The naming convention is configs/dataset/image_model_model_name.json
.
The following model_name
are available:
ff_bce
: Feed-forward model trained with binary cross-entropy loss.ff_iou
: Feed-forward model trained with soft intersection-over-union loss.ff_td
: Feed-forward model trained with target distribution loss.ff_bce_cat
: Feed-forward model trained with binary cross-entropy loss and categorical distribution loss for cardinality prediction.ff_iou_cat
: Feed-forward model trained with soft intersection-over-union loss and categorical distribution loss for cardinality prediction.ff_td_cat
: Feed-forward model trained with target distribution loss and categorical distribution loss for cardinality prediction.ff_bce_dc
: Feed-forward model trained with binary cross-entropy loss and Dirichlet-categorical loss for cardinality prediction.lstm
: LSTM model trained witheos
token for cardinality prediction.lstm_shuffle
: Same aslstm
but labels are shuffled every time an image is loaded.lstmset
: LSTM model trained witheos
token for cardinality prediction and pooled across time steps.tf
: Transformer model trained witheos
token for cardinality prediction.tf_shuffle
: Same astf
but labels are shuffled every time an image is loaded.tf_set
: Transformer model trained witheos
token for cardinality prediction and pooled across time steps.
The following image_model
are available:
resnet50
: Use resnet50 as image feature extractor.resnet101
: Use resnet101 as image feature extractor.resnext101_32x8d
: Use resnext101_32x8d as image feature extractor.
Note: resnet101
and resnext101_32x8d
image feature extractors are only available for ff_bce
and lstm
.
Training can be run as in the following example command:
$ python train.py --save_dir ../checkpoints --resume --seed SEED --dataset DATASET \
--image_model IMAGE_MODEL --model_name MODEL_NAME --use_json_config
where DATASET is a dataset name (e.g. voc
), IMAGE_MODEL and MODEL_NAME are among the models listed above (e.g. resnet50
and ff_bce
) and SEED is the value of a random seed (e.g. 1235
).
Check training progress with Tensorboard from ../checkpoints
:
$ tensorboard --logdir='.' --port=6006
Note: all python calls below must be run from ./src
.
Calculate evaluation metrics as in the following example command:
$ python eval.py --eval_split test --models_path PATH --dataset DATASET --batch_size 100
where DATASET is a dataset name (e.g. voc
) and PATH is the path to the saved models folder.
We are releasing ff_bce
and lstm
pre-trained models (single seed) for all image backcbones. Please follow the links below:
VOC | COCO | NUSWIDE | ADE20k | RECIPE1M | |
---|---|---|---|---|---|
resnet50 | |||||
resnet101 | |||||
resnext101_32x8d |
image-to-set is released under MIT license, see LICENSE for details.