Understanding and Comparing Deep Neural Networks for Age and Gender Classification - Data and Models
This repository contains all the evaluated models for which results are reported in the paper titled paper titled "Understanding and Comparing Deep Neural Networks for Age and Gender Classification" as published in the proceedings of the IEEE Workshop on Analysis and Modeling of Faces and Gestures (AMFG) at the International Conference on Computer Vision (ICCV) 2017.
Should you find any code or the models from this github repository useful, please add a reference to the corresponding publication to your work:
@incproceedings{lapuschkin2017understanding,
author = {Lapuschkin, Sebastian and Binder, Alexander and M\"uller, Klaus-Robert and Samek, Wojciech},
title = {Understanding and Comparing Deep Neural Networks for Age and Gender Classification},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW)},
pages = {1629-1638},
year = {2017},
doi = {10.1109/ICCVW.2017.191},
url = {https://doi.org/10.1109/ICCVW.2017.191}
}
This repo contains the deploy.prototxt
and train_val.prototxt
files for all model architectures, pretraining and preprocessing choices for which performance measures are reported in the paper linked above.
mean.binaryproto
files for the employed datasets and Caffe are supplied as well.
This repository shares scripts and workflows with Gil Levi's age and gender deep learning project page.
Model performances, depending on architecture, initialization and data preprocessing, averaged over all folds of the data set. For additional results, see section Result Overview.
Due to github's hard file size limit of 100mb per file, all model weights (i.e. the *.caffemodel
files) and lmdb
data files are hosted externally, via a nextcloud service of the Fraunhofer Heinrich Hertz Institute (see section Repository Content below).
All heatmap visualizations shown in the paper, such as the image at the top of the page, have been generated using the LRP implementation for Caffe, as provided by in the LRP Toolbox.
Scripts assisting in the computation of heatmap visualizations can be found in folder heatmap_drawing
Exemplary LRP heatmap visualizations for the predicted classes on a gender prediction task, identifying regions in the input image used by the model to decide for (hot colors) or argue against (cold colors) the predicted (=true, in these cases) class
- Folder
folds
contains the dataset split description for the Adience benchmark data used for training and evaluation. This folder is an extension to the one found in Gil Levi's repo and contains additional preprocessing settings. training_scripts
contains shell scripts used for starting the training of the neural network models.DataPrepartionCode
contains scripts for generatingmean.binaryproto
andlmdb
binary blobs from raw Adience image data. This folder is an extension to the one found in Gil Levi's repo and contains additional preprocessing settings.- The folder
mean_images
contains themean.binaryproto
files for all folds and preprocessing choices, as used for training, validation and testing - The folder
model_definitions
contains the*.prototxt
files for Caffe, i.e. a description of the model architecture each. Here, a naming pattern[target]_[init]_[arch][_preproc]
applies, wheretarget
is from{age, gender}
and describes the prediction probleminit
is from{fromscratch, finetuning, imdbwiki}
and describes random initialization, a weight intialization from ImageNet pretraining, and a weight initialization from ImageNet pretraining followed by IMDB-WIKI pretraining, respectively.arch
is from{caffereference, googlenet, vgg16, net_definitions}
and describes the architecture of the model. Here,net_definitions
refers to the model architecture used in Gil Levi's repo. Thenet_definitions
models do not have aninit
block within the folder name.- The
_preproc
suffix is optional and refers to_unaligned
images (i.e. training images only under rotation alignment), aligned training images (landmark-based alignment, so suffix) or_mixed
alignment, (i.e. both images under landmark-based and rotation-based alignment are used for training) - The pretrained models used as starting points (
init
) for training can be downloaded here. The model weights behind this link have been downloaded from the Caffe repo (caffereference
,googlenet
), the IMDB-WIKI project page (vgg16
onimdbwiki
) and the Caffe Model Zoo (vgg16
onimagenet
).
- The
lmdb
files used for model training, validation testing can be downloaded here. - The model weights (i.e. the
*.caffemodel
files) to the neural network descriptions contained in this repository can be downloaded here. These files match the model definitions in foldermodel_definitions
heatmap_drawing
contains scripts generating configuration files for computing LRP heatmaps using the LRP Toolbox for Caffe.
Note that you will have to adapt the (absolute) paths denoted in scripts and model description files in order to use the code.
Below table briefly presents the obtained results from the paper this repository belongs to.
age | AdienceNet | CaffeNet | GoogleNet | VGG16 | gender | AdienceNet | CaffeNet | GoogleNet | VGG16 | |
[i, ⋅] | 51.487.0 | 52.587.9 | 54.489.3 | [i, ⋅] | 88.1 | 87.7 | 88.2 | |||
[r, ⋅] | 51.987.4 | 52.689.0 | 54.490.0 | [r, ⋅] | 88.3 | 88.0 | 89.3 | |||
[m, ⋅] | 53.688.4 | 54.489.7 | 56.590.8 | [m, ⋅] | 89.0 | 88.9 | 89.7 | |||
[i,n] | 51.787.6 | 56.691.0 | 53.888.2 | [i,n] | 90.0 | 91.2 | 92.0 | |||
[r,n] | 52.287.1 | 57.592.0 | [r,n] | 90.7 | 91.7 | |||||
[m,n] | 53.088.4 | 58.892.7 | 56.590.0 | [m,n] | 90.6 | 92.0 | 92.7 | |||
[i,w] | 60.294.2 | [i,w] | 90.6 | |||||||
[r,w] | [r,w] | |||||||||
[m,w] | 63.096.0 | [m,w] | 92.3 | |||||||
Face categorization results in accuracy and percent, using oversampling for prediction. Left: Results for age classification. Small numbers next to the accuracy score show 1-off accuracy, the accuracy of predicting the correct age group or an adjacent one. Right: Results for gender prediction. Entries in the gender and age column indicate choices for data preprocessing and model initialization:
- i: in-plane, landmark based face alignment, r: rotation based alignment, m: combining i and r for training and using r for testing
- n: Imagenet pretraining, ⋅: random weight initialization and w: IMDB-WIKI pretraining following ImageNet pretraining
Bold values match or exceed the at publication time reported state of the art results on the Adience benchmark dataset.