This package provides the Matlab source code for training and extracting Fisher vectors of Latent Bag-of-Words (LatBoW) and Latent Mixture-of-Gaussians (LatMoG) models, as explained in
- Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid, “Approximate Fisher Kernels of non-iid Image Models for Image Categorization”, accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2015,
- Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid, “Image categorization using Fisher kernels of non-iid image models”, in IEEE Conference on Computer Vision & Pattern Recognition (CVPR), Providence, USA, June 2012.
Call
[genm,eprm] = fv_fisher_latentgmm_variationalestimate(p,fvbase,N,D,K,[])
to train a generative model genm
using variational expectation-maximization (EM) procedure. This function also returns the posteriors eprm
for the training examples.
Call
[eprm] = fv_fisher_latentgmm_estep(fvbase,N,D,K,[],genm,p)
to estimate posteriors on test images.
Call
[desc] = fv_fisher_latentgmm_grads(N,D,K,gradopt,genm,eprm)
to extract LatMoG Fisher vectors. Use grapopt=alpha
for the LatBoW model and grapopt=all
for the LatMoG model to get all Fisher vector components. In our experiments, we then apply per-dimension whitening and L2 normalization to the resulting descriptors, as explained in the paper.
In order to utilize the aforementioned functions, the following per-image statistics over local descriptors should to be provided in the fvbase
struct:
field | Size | Contents |
---|---|---|
E_x | (N D K) | E_x(j,:,k)=sum_i( p(k|x_i) * x_i ) / sum_i( p(k|x_i) ), over x_i \in image_j |
E_x2 | (N D K) | E_x2(j,:,k)=sum_i( p(k|x_i) * x_i^2 ) / sum_i( p(k|x_i) ) |
counts | (N K) | counts(j,k)=sum_i( p(k|x_i) ), over x_i \in image_j |
where
x_i
i-th local descriptor in an image.N
: number of training imagesD
: local descriptor dimensionalityK
: vocabulary sizeE_x
andE_x2
fields are not needed when training LatBoW models.
A couple of other options should be provided in the p
struct. See example.m
for default values and their explanations.
A final note: When using spatial grids, we train per-cell models independently.