/enceval-toolkit

Evaluation toolkit for various image feature encodings for Matlab

Primary LanguageMATLABBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Encoding Methods Evaluation Toolkit

Author: Ken Chatfield, University of Oxford (ken@robots.ox.ac.uk)

Copyright 2011-2013, all rights reserved.

Release: v1.2

Licence: BSD (see COPYING file)

Overview

The encoding methods evaluation toolkit provides a unified framework for evaluating bag-of-words based encoding methods over several standard image classification datasets.

Currently the following encoding methods are supported:

  • Standard hard encoding
  • Kernel codebook encoding
  • Locality-constrained linear encoding
  • Fisher kernel

These can be evaluated over the following datasets:

  • PASCAL VOC 2007
  • Caltech 101

Support for further methods and datasets will be provided in future releases.

Paper and Licencing

Details relating to the supporting paper can be found on the research page at:

If you use this code in your experiments, please cite:

K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman
The devil is in the details: an evaluation of recent feature encoding methods
British Machine Vision Conference, 2011

A copy of the paper is available at:

This software is distributed under the BSD licence (see COPYING file). We are always interested in how this software is being used, so if you found this software useful or plan to make a release of code based on or using this package, it would be great to hear from you. Similarly, acknowledgements or citations to the above paper are appreciated.

Installation

The library requires the following libraries, which must be installed and accessible on the MATLAB path:

In addition, the test datasets should be downloaded and stored in a directory of your choice:

Once these prerequisites have been met, the toolkit can be installed as follows:

  1. Download and unpack `enceval-toolkit-1.0.tar.gz' (if you are reading this, you may already have completed this step).
  2. From within the encval-toolkit directory run the featpipem_setup.m script in MATLAB. This will compile all supporting MEX files for use on your system.

Running the Demo

An example script, vqdemo.m, showing how the library is called can be found in the envcal-toolkit directory. In order to run the demo, first open the script and modify the prms.paths.dataset variable to point to the directory in which you have saved the PASCAL VOC 2007 dataset.

This demo performs the following steps:

  1. Trains a vocabulary of size 4,000 visual words from SIFT features densely extracted from the PASCAL VOC 2007 train/val set
  2. Computes the stanard hard assignment bag-of-words encoding for all images in the PASCAL VOC 2007 dataset using this vocabulary
  3. Applies an additive kernel map (chi-2) to all codes
  4. Trains a linear SVM for each of the 20 classes in the PASCAL VOC 2007 dataset using codes computed for the train/val set
  5. Applies each classifier to the codes computed for the test set and calculates the performance over each class using average precision (results compatible with PASCAL VOC 2007 development kit)

The results thus obtained relate the row (y) in Table 1 of the paper.

There are also sample launch scripts for the other supported encoding methods:

  • LLCdemo.m - Locality-constrained linear ecoding
  • KCBdemo.m - Kernel codebook encoding
  • FKdemo.m - Fisher kernel encoding

NOTE 1: a single run of the pipeline is likely to take at least 3-4+ hours. To speed things up, open a matlab pool by calling `matlabpool x' where x is the number of workers you wish to use to execute the code in parallel.

NOTE 2: the computational data generated by each step of the demo run are also available on the website at: http://www.robots.ox.ac.uk/~vgg/software/enceval_toolkit for validation purposes.

Customizing Parameters

The toolkit offers two main wrapper functions:

>> featpipem.wrapper.loadcodebook(codebkgen, prms)

Which trains a dictionary of visual words

>> featpipem.wrapper.dstest(prms, codebook, featextr, encoder, pooler,
                            classifier)

Which evaluates performance over a dataset

`prms1 is a structure containing experimental parameters of a general nature (name of the experiment, dataset to test over, paths to store computational data and final results, training splits to use). The other parameters are a series of classes relating to each step in the classification pipeline.

The toolkit is designed in such a way that changing the method used for each of the steps is as simple as choosing the appropriate class to instantiate from a collection of classes sharing a common interface, one for each of the methods supported by the toolkit.

For example, to change the encoding method used in the demo code from standard hard assignment to locality-constrained linear coding, simply replace the following code:

encoder = featpipem.encoding.VQEncoder(codebook);
encoder.max_comps = 25;
encoder.norm_type = 'l1';

With this code:

encoder = featpipem.encoding.LLCEncoder(codebook);
encoder.max_comps = 500;
encoder.norm_type = 'l2';

and then pass the encoder thus instantiated to the dstest function as before. We have replaced an instance of the VQEncoder class with the LLCEncoder class - this is all that is required to change encoding method. Some classes also have additional properties which can be set. For example, for the LLCEncoder class it is also possible to specify the beta parameter as follows:

encoder.beta = 1e-4;

A set of default values is provided for all parameters in each class. For further details, refer to the class definitions in the +featpipem subfolder. Examples of how to call the code with the locality-constrained linear coding and kernel codebook methods can be found in the enceval-toolkit directory (llcdemo.m and kcbdemo.m respectively).

LLCdemo.m and KCBdemo.m also demonstrate how to scan for an optimal SVM c parameter (when setting bCrossValSVM to true). Once the result files for multiple values of the c parameter have been generated, use the findbestperf function in the utility subdirectory to find the result file (and thus the c parameter) which offers the best performance.

Notes on Performance

For the example code provided with the toolkit, normalization of spatial pyramid subbins is disabled, with only the full spatial pyramid vector being normalized. To reproduce the results of the paper up to some statistical variation, re-enable normalization of spatial pyramid subbins by setting the following parameters:

encoder.norm_type = 'l1' or 'l2'
pooler.subbin_norm_type = 'l1' or 'l2'

Version History

  • Version 1.2 - (14 June 2013) Bugfix release
  • Version 1.1 - (16 July 2012) Added support for Fisher kernel, bugfixes
  • Version 1.0 - (7 November 2011) Initial release

GMM-Fisher Library

The code used to train GMM vocabularies and compute the Fisher kernel encoding is provided by the 'GMM-Fisher' sublibrary, found in the lib/gmm-fisher/ subdirectory. This is a standalone C++ library contributed kindly by Jorge Sánchez. Thanks to both him and Florent Perronnin for their invaluable help and advice on getting this part of the toolkit up and running.

Acknowledgements

Thanks also to Karen Simonyan for doing a very thorough job of testing the code, contributing numerous bug-fixes and helping to generally speed it up and run more efficiently. Thanks are also due to Yuning Chai for the initial version of the locality-constrained linear coding implementation and co-author Victor Lempitsky, whose codebase heavily inspired the structure of the library. IMDB ground truth files are provided by co-author Andrea Vedaldi.