A Metric Learning Reality Check
Benchmark results (in progress):
Benefits of this library
- Highly configurable
- Use the default configs files, merge in your own, or override options via the command line.
- Extensive logging
- View experiment data in tensorboard, csv, and sqlite format.
- Easy hyperparameter optimization
- Simply append ~BAYESIAN~ to the hyperparameters you want to optimize.
- Customizable
- Register your own losses, miners, datasets etc. with a simple function call.
Installation
pip install powerful-benchmarker
Usage
Set default flags
The easiest way to get started is to download the example script. Then change the default values for the following flags:
- pytorch_home is where you want to save downloaded pretrained models.
- dataset_root is where your datasets are located.
- root_experiment_folder is where you want all experiment data to be saved.
Try a basic command
The following command will run an experiment using the default config files, as well as download the CUB200 dataset into your dataset_root
python run.py --experiment_name test1 --dataset {CUB200: {download: True}}
(For the rest of this readme, we'll assume the datasets have already been downloaded.)
Experiment data is saved in the following format:
<root_experiment_folder>
|-<experiment_name>
|-configs
|-config_eval.yaml
|-config_general.yaml
|-config_loss_and_miners.yaml
|-config_models.yaml
|-config_optimizers.yaml
|-config_transforms.yaml
|-<split scheme name>
|-saved_models
|-saved_csvs
|-tensorboard_logs
|-meta_logs
|-saved_csvs
|-tensorboard_logs
Override config options at the command line
The default config files use a batch size of 32. What if you want to use a batch size of 256? Just write the flag at the command line:
python run.py --experiment_name test1 --batch_size 256
Complex options (i.e. nested dictionaries) can be specified at the command line:
python run.py \
--experiment_name test1 \
--mining_funcs {tuple_miner: {PairMarginMiner: {pos_margin: 0.5, neg_margin: 0.5}}}
The ~OVERRIDE~
suffix is required to completely override complex config options. For example, the following overrides the default loss function:
python run.py \
--experiment_name test1 \
--loss_funcs {metric_loss~OVERRIDE~: {ArcFaceLoss: {margin: 30, scale: 64, embedding_size: 128}}}
Leave out the ~OVERRIDE~
suffix if you want to merge options. For example, we can add an optimizer for our loss function's parameters:
python run.py \
--experiment_name test1 \
--optimizers {metric_loss_optimizer: {SGD: {lr: 0.01}}}
This will be included along with the default optimizers.
We can change the learning rate of the trunk_optimizer, but keep all other parameters the same:
python run.py \
--experiment_name test1 \
--optimizers {trunk_optimizer: {RMSprop: {lr: 0.01}}}
Or we can make trunk_optimizer use Adam, but leave embedder_optimizer to the default setting:
python run.py \
--experiment_name test1 \
--optimizers {trunk_optimizer~OVERRIDE~: {Adam: {lr: 0.01}}}
Combine yaml files at the command line
The following merges the with_cars196
config file into the default
config file, in the config_general
category.
python run.py --experiment_name test1 --config_general [default, with_cars196]
This is convenient when you want to change a few settings (specified in with_cars196
), and keep all the other options unchanged (specified in default
). You can specify any number of config files to merge, and they get loaded and merged in the order that you specify.
Resume training
The following resumes training for the test1
experiment, using the latest saved models.
python run.py --experiment_name test1 --resume_training latest
You can also resume using the model with the best validation accuracy:
python run.py --experiment_name test1 --resume_training best
Let's say you finished training for 100 epochs, and decide you want to train for another 50 epochs, for a total of 150. You would run:
python run.py --experiment_name test1 --resume_training latest \
--num_epochs_train 150 --merge_argparse_when_resuming
(The merge_argparse_when_resuming
tells the code that you want to make changes to the original experiment configuration. If you don't use this flag, then the code will ignore your command line arguments, and use the original configuration. The purpose of this is to avoid accidentally changing configs in the middle of an experiment.)
Now in your experiments folder you'll see the original config files, and a new folder starting with resume_training
.
<root_experiment_folder>
|-<experiment_name>
|-configs
|-config_eval.yaml
...
|-resume_training_config_diffs_<underscore delimited numbers>
...
This folder contains all differences between the originally saved config files and the parameters that you've specified at the command line. In this particular case, there should just be a single file config_general.yaml
with a single line: num_epochs_train: 150
.
The underscore delimited numbers in the folder name indicate which models were loaded for each split scheme. For example, let's say you are doing cross validation with 3 folds. The training process has finished 50, 30, and 0 epochs of folds 0, 1, and 2, respectively. You decide to stop training, and resume training with a different batch size. Now the config diff folder will be named resume_training_config_diffs_50_30_0
.
Reproducing benchmark results
To reproduce an experiment from the benchmark spreadsheets, use the --reproduce_results
flag:
- In the benchmark spreadsheet, click on the google drive link under the "config files" column.
- Download the folders you want (for example
cub200_old_approach_triplet_batch_all
), into some folder on your computer. For example, I downloaded into/home/experiments_to_reproduce
- Then run:
python run.py --reproduce_results /home/experiments_to_reproduce/cub200_old_approach_triplet_batch_all \
--experiment_name cub200_old_approach_triplet_batch_all_reproduced
If you'd like to change some parameters when reproducing results, you can either make those changes in the config files, or at the command line. For example, maybe you'd like to change the number of dataloaders:
python run.py --reproduce_results /home/experiments_to_reproduce/cub200_old_approach_triplet_batch_all \
--experiment_name cub200_old_approach_triplet_batch_all_reproduced \
--dataloader_num_workers 16 \
--eval_dataloader_num_workers 16 \
--merge_argparse_when_resuming
The merge_argparse_when_resuming
flag is required in order to use a different configuration from the one in the reproduce_results
folder.
Evaluation options
By default, your model will be saved and evaluated on the validation set every save_interval
epochs.
To get accuracy for specific splits, use the --splits_to_eval
flag and pass in a python-style list of split names. For example --splits_to_eval [train, test]
To run evaluation only, use the --evaluate
flag.
Split schemes and cross validation
One weakness of many metric-learning papers is that they have been training and testing on the same handful of datasets for years. They have also been splitting data into a 50/50 train/test split scheme, instead of train/val/test. This has likely lead to overfitting on the "test" set, as people have tuned hyperparameters and created algorithms with direct feedback from the "test" set.
To remedy this situation, this benchmarker allows the user to specify the split scheme. Here's an example config:
test_size: 0.5
test_start_idx: 0.5
num_training_partitions: 10
num_training_sets: 5
Translation:
- The test set consists of classes with labels in
[num_labels * test_start_idx, num_labels * (test_start_idx + test_size)]
. Note that if we settest_start_idx
to 0.9, the range would wrap around to the beginning (0.9 to 1, 0 to 0.4). - The remaining classes will be split into 10 equal sized partitions.
- 5 of those partitions will be used for training. In other words, 5-fold cross validation will be performed, but the size of the partitions will be the same as if 10-fold cross validation was being performed.
When evaluating the cross-validated models, the best model from each fold will be loaded, and the results be averaged. Alternatively, you can set the config option meta_testing_method
to ConcatenateEmbeddings
. This will load the best model from each fold, but treat them as one model during evaluation on the test set, by concatenating their outputs.
If instead you still want to use the old 50/50 train/test split, then set special_split_scheme_name
to old_approach
. Otherwise, leave it as null
.
Meta logs
When doing cross validation, a new set of meta records will be created. The meta records show the average of the best accuracies of your training runs. You can find these records on tensorboard and in the meta_logs folder.
Bayesian optimization to tune hyperparameters
You can use bayesian optimization using the same example script. In your config files or at the command line, append ~BAYESIAN~
to any parameter that you want to tune, followed by a lower and upper bound in square brackets. If your parameter operates on a log scale (for example, learning rates), then append ~LOG_BAYESIAN~
. You must also specify the number of iterations with the --bayes_opt_iters
command line flag.
Here is an example script which uses bayesian optimization to tune 3 hyperparameters for the multi similarity loss.
python run.py --bayes_opt_iters 50 \
--loss_funcs~OVERRIDE~ {metric_loss: {MultiSimilarityLoss: {alpha~LOG_BAYESIAN~: [0.01, 100], beta~LOG_BAYESIAN~: [0.01, 100], base~BAYESIAN~: [0, 1]}}} \
--experiment_name cub_bayes_opt \
If you stop and want to resume bayesian optimization, simply use run.py
with the same experiment_name
you were using before.
You can change the optimization bounds when resuming, by either changing the bounds in your config files or at the command line. If you're using the command line, make sure to also use the --merge_argparse_when_resuming
flag.
You can also run a number of reproductions for the best parameters, so that you can obtain a confidence interval for your results. Use the reproductions
flag, and pass in the number of reproductions you want to perform at the end of bayesian optimization.
python run.py --bayes_opt_iters 50 --reproductions 10 \
--experiment_name cub_bayes_opt \
Register your own classes and modules
By default, the API gives you access to losses/miners/datasets/optimizers/schedulers/trainers etc that are available in powerful-benchmarker, PyTorch, and pytorch-metric-learning.
Let's say you make your own loss and mining functions, and you'd like to have access to them via the API. You can accomplish this by replacing the last two lines of the example script with this:
from pytorch_metric_learning import losses, miners
# your custom loss function
class YourLossFunction(losses.BaseMetricLossFunction):
...
# your custom mining function
class YourMiningFunction(miners.BaseTupleMiner):
...
r = runner(**(args.__dict__))
# make the runner aware of them
r.register("loss", YourLossFunction)
r.register("miner", YourMiningFunction)
r.run()
Now you can access your custom classes just like any other class:
loss_funcs:
metric_loss:
YourLossFunction:
mining_funcs:
tuple_miner:
YourMiningFunction:
If you have a module containing multiple classes and you want to register all those classes, you can simply register the module:
import YourModuleOfLosses
r.register("loss", YourModuleOfLosses)
Registering your own trainer is a bit more involved, because you need to also create an associated API parser. The name of the api parser should be APIParser<name of your training method>
.
Here's an example where I make a trainer that extends trainers.MetricLossOnly
, and takes in an additional argument foo
. In order to pass this in, the API parser needs to add foo
to the trainer kwargs, and this is done in the get_trainer_kwargs
method.
from pytorch_metric_learning import trainers
from powerful_benchmarker import api_parsers
class YourTrainer(trainers.MetricLossOnly):
def __init__(self, foo, **kwargs):
super().__init__(**kwargs)
self.foo = foo
print("foo = ", self.foo)
class APIYourTrainer(api_parsers.BaseAPIParser):
def get_foo(self):
return "hello"
def get_trainer_kwargs(self):
trainer_kwargs = super().get_trainer_kwargs()
trainer_kwargs["foo"] = self.get_foo()
return trainer_kwargs
r = runner(**(args.__dict__))
r.register("trainer", YourTrainer)
r.register("api_parser", APIYourTrainer)
r.run()
Config options guide
Below is the format for the various config files. Click on the links to see the default yaml file for each category.
config_general
training_method: <type> #options: MetricLossOnly, TrainWithClassifier, CascadedEmbeddings, DeepAdversarialMetricLearning
testing_method: <type> #options: GlobalEmbeddingSpaceTester, WithSameParentLabelTester
meta_testing_method: <list> #options: meta_SeparateEmbeddings or meta_ConcatenateEmbeddings
dataset:
<type>: #options: CUB200, Cars196, StanfordOnlineProducts
<kwarg>: <value>
...
splits_to_eval: <list> #strings corresponding to dataset split names, i.e. train, val, test.
num_epochs_train: <how long to train for>
iterations_per_epoch: <Optional. If set, an epoch will simply be a fixed number of iterations. Or you can set this to null or 0, and it will be ignored.>
save_interval: <how often (in number of epochs) models will be saved and evaluated>
special_split_scheme_name: <string> #options: old_approach or predefined. Leave as null if you want to do cross validation.
test_size: <number> #number in (0, 1), which is the percent of classes that will be used in the test set.
test_start_idx: <number> #number in (0, 1), which is the percent that specifies the starting class index for the test set
num_training_partitions: <int> #number of partitions (excluding the test set) that are created for cross validation.
num_training_sets: <int> #number of partitions that are actually used as training sets cross validation.
label_hierarchy_level: <number>
dataloader_num_workers: <number>
check_untrained_accuracy: <boolean>
skip_eval_if_already_done: <boolean>
skip_meta_eval_if_already_done: <boolean>
patience: <int> #Training will stop if validation accuracy has not improved after this number of epochs. If null, then it is ignored.
config_models
models:
trunk:
<type>:
<kwarg>: <value>
...
embedder:
<type>:
<kwarg>: <value>
...
batch_size: <number>
freeze_batchnorm: <boolean>
config_loss_and_miners
loss_funcs:
<name>:
<type>:
<kwarg>: <value>
...
...
sampler:
<type>:
<kwarg>: <value>
...
mining_funcs:
<name>:
<type>:
<kwarg>: <value>
...
...
config_optimizers
optimizers:
trunk_optimizer:
<type>:
<kwarg>: <value>
...
embedder_optimizer:
<type>:
<kwarg>: <value>
...
...
config_transforms
transforms:
train:
<type>
<kwarg>: <value>
...
...
eval:
<type>
<kwarg>: <value>
...
...
config_eval
eval_reference_set: <name> #options: compared_to_self, compared_to_sets_combined, compared_to_training_set
eval_normalize_embeddings: <boolean>
eval_use_trunk_output: <boolean>
eval_batch_size: <number>
eval_metric_for_best_epoch: <name> #options: NMI, precision_at_1, r_precision, mean_average_r_precision
eval_dataloader_num_workers: <number>
eval_pca: <number> or null #options: number of dimensions to reduce embeddings to via PCA, or null if you don't want to use PCA.
eval_accuracy_calculator:
<type>
<kwarg>: <value>
Acknowledgements
Thank you to Ser-Nam Lim at Facebook AI, and my research advisor, Professor Serge Belongie. This project began during my internship at Facebook AI where I received valuable feedback from Ser-Nam, and his team of computer vision and machine learning engineers and research scientists.
Citing the benchmark results
If you'd like to cite the benchmark results, please cite this paper:
@misc{musgrave2020metric,
title={A Metric Learning Reality Check},
author={Kevin Musgrave and Serge Belongie and Ser-Nam Lim},
year={2020},
eprint={2003.08505},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Citing the code
If you'd like to cite the powerful-benchmarker code, you can use this bibtex:
@misc{Musgrave2019,
author = {Musgrave, Kevin and Lim, Ser-Nam and Belongie, Serge},
title = {Powerful Benchmarker},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/KevinMusgrave/powerful-benchmarker}},
}