/lfw_fuel

Labeled Faces in the Wild dataset, converted to fuel

Primary LanguagePython

LFW dataset, converted to fuel

Labeled Faces in the Wild is a database of face photographs designed for studying the problem of unconstrained face recognition.

This project currently packages the pairsDevTrain / pairsDevTest image sets into a fuel compatible dataset along with targets to indicate whether the pairs are same or different. In addition to the original lfw dataset, conversion is supported for both the funneled and deepfunneled versions of the images.

This project uses kerosene to produce a fuel-compatible hdf5 file that is usable by blocks or keras.

Show me

From the included example

from keras.models import Sequential
from lfw_fuel import lfw

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = lfw.load_data(format="deepfunneled")

# (build the perfect model here)

model.fit(X_train, Y_train, show_accuracy=True, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)

The features are currently stored in six channels - three for each of the two RGB images to be compared.

Note that the images are 250x250 - which is quite large by most CNN standards. These can be cropped and scaled before passing them to the network as shown in the example.

What's this dataset all about again?

The primary task of Labeled Faces in the Wild is to learn whether the faces in two pictures are of the same person, or two different people. There are 2200 training pairs and 1000 test pairs in the predefined split.

Here are three matching training pairs:

Image 1 Image 2 Status
Aaron_Peirsol_0003 Aaron_Peirsol_0004 MATCH
Aaron_Sorkin_0001 Aaron_Sorkin_0002 MATCH
Abdel_Nasser_Assidi_0001 Abdel_Nasser_Assidi_0002 MATCH

And here are three non-matching training pairs

Image 1 Image 2 Status
Lee_Nam-shin_0001 Nick_Nolte_0001 DIFFERENT
Lee_Soo-hyuck_0001 Scott_Sullivan_0001 DIFFERENT
Lee_Yeo-jin_0001 Mariangel_Ruiz_Torrealba_0001 DIFFERENT

In addition, this dataset is provided in both this raw format, and at least two "preprocessed" versions called funneled and deepfunneled. Often these are very similar, but here is an example of how they can differ.

Original Funneled Deep Funneled
Amelia_Vega_0004 Amelia_Vega_0004 Funneled Amelia_Vega_0004 Deep Funneled

On the LFW page you can browse the complete training set or the complete test set and see all three versions of all images.

Example

There is an included example of how to train a network using keras for this task. To run this example from the repo:

$ python example/run-lfw.py

This should run the example, downloading the dataset if necessary.

Note that currently the example runs, but the performance is poor. Suggestions or merge requests improving this example certainly welcome.

Installation

Installation is optional - if kerosene is installed then simply clone the repo and run the example script. However, installation is an option so that the lfw_fuel dependency can be used from the path, which can be useful if you'd like to use this dataset in your own blocks or keras project.

python setup.py install

You can also rebuild the hdf5 files from scratch by running fuel-download and fuel-convert with updated settings for EXTRA_DOWNLOADERS and EXTRA_CONVERTERS.

FUEL_EXTRA_DOWNLOADERS="lfw_fuel" fuel-download lfw
FUEL_EXTRA_CONVERTERS="lfw_fuel" fuel-convert lfw

This will convert the original version of lfw, but funneled and deepfunneled formats are also supported:

FUEL_EXTRA_DOWNLOADERS="lfw_fuel" fuel-download lfw --format deepfunneled
FUEL_EXTRA_CONVERTERS="lfw_fuel" fuel-convert lfw --format deepfunneled

These settings can also be set in the ~/.fuelrc file:

extra_downloaders: ['lfw_fuel']
extra_converters: ['lfw_fuel']

License

MIT