Author: Ken Chatfield, University of Oxford (ken@robots.ox.ac.uk)
Copyright 2011-2013, all rights reserved.
Release: v1.2
Licence: BSD (see COPYING file)
The encoding methods evaluation toolkit provides a unified framework for evaluating bag-of-words based encoding methods over several standard image classification datasets.
Currently the following encoding methods are supported:
- Standard hard encoding
- Kernel codebook encoding
- Locality-constrained linear encoding
- Fisher kernel
These can be evaluated over the following datasets:
- PASCAL VOC 2007
- Caltech 101
Support for further methods and datasets will be provided in future releases.
Details relating to the supporting paper can be found on the research page at:
If you use this code in your experiments, please cite:
K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman
The devil is in the details: an evaluation of recent feature encoding methods
British Machine Vision Conference, 2011
A copy of the paper is available at:
This software is distributed under the BSD licence (see COPYING file). We are always interested in how this software is being used, so if you found this software useful or plan to make a release of code based on or using this package, it would be great to hear from you. Similarly, acknowledgements or citations to the above paper are appreciated.
The library requires the following libraries, which must be installed and accessible on the MATLAB path:
- VLFeat library (available from http://www.vlfeat.org)
- LIBSVM library (available from http://www.csie.ntu.edu.tw/~cjlin/libsvm)
In addition, the test datasets should be downloaded and stored in a directory of your choice:
- VOC 2007 training/validation data and test data (available from http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007)
Once these prerequisites have been met, the toolkit can be installed as follows:
- Download and unpack `enceval-toolkit-1.0.tar.gz' (if you are reading this, you may already have completed this step).
- From within the encval-toolkit directory run the featpipem_setup.m script in MATLAB. This will compile all supporting MEX files for use on your system.
An example script, vqdemo.m, showing how the library is called can be found in the envcal-toolkit directory. In order to run the demo, first open the script and modify the prms.paths.dataset variable to point to the directory in which you have saved the PASCAL VOC 2007 dataset.
This demo performs the following steps:
- Trains a vocabulary of size 4,000 visual words from SIFT features densely extracted from the PASCAL VOC 2007 train/val set
- Computes the stanard hard assignment bag-of-words encoding for all images in the PASCAL VOC 2007 dataset using this vocabulary
- Applies an additive kernel map (chi-2) to all codes
- Trains a linear SVM for each of the 20 classes in the PASCAL VOC 2007 dataset using codes computed for the train/val set
- Applies each classifier to the codes computed for the test set and calculates the performance over each class using average precision (results compatible with PASCAL VOC 2007 development kit)
The results thus obtained relate the row (y) in Table 1 of the paper.
There are also sample launch scripts for the other supported encoding methods:
- LLCdemo.m - Locality-constrained linear ecoding
- KCBdemo.m - Kernel codebook encoding
- FKdemo.m - Fisher kernel encoding
NOTE 1: a single run of the pipeline is likely to take at least 3-4+ hours. To speed things up, open a matlab pool by calling `matlabpool x' where x is the number of workers you wish to use to execute the code in parallel.
NOTE 2: the computational data generated by each step of the demo run are also available on the website at: http://www.robots.ox.ac.uk/~vgg/software/enceval_toolkit for validation purposes.
The toolkit offers two main wrapper functions:
>> featpipem.wrapper.loadcodebook(codebkgen, prms)
Which trains a dictionary of visual words
>> featpipem.wrapper.dstest(prms, codebook, featextr, encoder, pooler,
classifier)
Which evaluates performance over a dataset
`prms1 is a structure containing experimental parameters of a general nature (name of the experiment, dataset to test over, paths to store computational data and final results, training splits to use). The other parameters are a series of classes relating to each step in the classification pipeline.
The toolkit is designed in such a way that changing the method used for each of the steps is as simple as choosing the appropriate class to instantiate from a collection of classes sharing a common interface, one for each of the methods supported by the toolkit.
For example, to change the encoding method used in the demo code from standard hard assignment to locality-constrained linear coding, simply replace the following code:
encoder = featpipem.encoding.VQEncoder(codebook);
encoder.max_comps = 25;
encoder.norm_type = 'l1';
With this code:
encoder = featpipem.encoding.LLCEncoder(codebook);
encoder.max_comps = 500;
encoder.norm_type = 'l2';
and then pass the encoder thus instantiated to the dstest
function
as before. We have replaced an instance of the VQEncoder class with
the LLCEncoder class - this is all that is required to change encoding
method. Some classes also have additional properties which can be
set. For example, for the LLCEncoder class it is also possible to
specify the beta
parameter as follows:
encoder.beta = 1e-4;
A set of default values is provided for all parameters in each
class. For further details, refer to the class definitions in the
+featpipem
subfolder. Examples of how to call the code with the
locality-constrained linear coding and kernel codebook methods can be
found in the enceval-toolkit directory (llcdemo.m
and kcbdemo.m
respectively).
LLCdemo.m
and KCBdemo.m
also demonstrate how to scan for an
optimal SVM c parameter (when setting bCrossValSVM
to true). Once the
result files for multiple values of the c parameter have been
generated, use the findbestperf
function in the utility
subdirectory to find the result file (and thus the c parameter) which
offers the best performance.
For the example code provided with the toolkit, normalization of spatial pyramid subbins is disabled, with only the full spatial pyramid vector being normalized. To reproduce the results of the paper up to some statistical variation, re-enable normalization of spatial pyramid subbins by setting the following parameters:
encoder.norm_type = 'l1' or 'l2'
pooler.subbin_norm_type = 'l1' or 'l2'
- Version 1.2 - (14 June 2013) Bugfix release
- Version 1.1 - (16 July 2012) Added support for Fisher kernel, bugfixes
- Version 1.0 - (7 November 2011) Initial release
The code used to train GMM vocabularies and compute the Fisher kernel
encoding is provided by the 'GMM-Fisher' sublibrary, found in the
lib/gmm-fisher/
subdirectory. This is a standalone C++ library
contributed kindly by Jorge Sánchez. Thanks to both him
and Florent Perronnin for
their invaluable help and advice on getting this part of the toolkit
up and running.
Thanks also to Karen Simonyan for doing a very thorough job of testing the code, contributing numerous bug-fixes and helping to generally speed it up and run more efficiently. Thanks are also due to Yuning Chai for the initial version of the locality-constrained linear coding implementation and co-author Victor Lempitsky, whose codebase heavily inspired the structure of the library. IMDB ground truth files are provided by co-author Andrea Vedaldi.