/gender_classifier

Classify sex using images with transfer learning from VGG-Face

Primary LanguagePython

Gender Classifier on Adience Data Using Tranfer Learning From VGG Face Descriptor

Ethics Warning:

  • This model has not been tested for potential bias based on race, skin color, etc, as such it is not designed for real-world use unless comprehensively tested and reformed.
  • Be aware of the ethical issues behind sex/gender classification before using this code.

Overview:

I have used the VGG Face Descriptor model for transfer learning, to train a new model for classifiying sex in the Adience dataset.

1. Results:

After training for 3 epochs on my laptop (~7 hours):

Set Accuracy Binary Crossentropy
Trainin set 0.9670 0.0897
Validation set 0.9275 0.2631

We can see it is slightly overfitting to training data. Allowing for more data augmentation, or using a bigger training set can possibly help with this. Another option will be to start from a lower layer of the VGG-face model when building the new classifier on top of it. Currenly I'm building on top of layer 30 out of the 53 total layers in VGG-face. (53 layers when counting padding and activation layers separately). See explanations below for more on this.

2. How to run:

The code provides three options for running:

2.1. Classify a single image:

Use the argument classify when running the command:

$python gender_classifier.py classify -m "stored_model" -i "input_image_to_classify"

Example:

$python3.7 train_adience_gender.py classify -m trained_gender_classifier.h5 -i ../data/adience/combined/valid/11_F/landmark_aligned_face.957.12059888826_929090d81b_o.jpg 
****
output class is female. (sigmoid value=0.025847017765045166)
****

2.2. Train and store a new model on top of VGG-Face

Use the argument train when running the command:

$python gender_classifier.py train -w "path_to_t7_file_of_vggface_weights" -i "adience_data_directory" -o "filepath_for_the_output_model"

Example:

$python3.7 gender_classifier.py train -w ../../vgg_face_torch/VGG_FACE.t7 -i ../../data/adience/combined -o ~/output_model.h5 -e 1
  • input argument for "-w" specifies the file for weights of the pretrained model, downloadable from here
  • input argument for "-i" specifies the directory for the images of the Adience dataset. It includes two subdirectories: aligned and valid. It can be downloaded from here.
  • "-o" specifies the filename and the path to where the trained model should be stored
  • "-e" is the number of epochs to train, 1-3 should be enough. By default it is set to 1.

Other optional arguments include:

  • "-b1" for training set batch size
  • "-b2" for validation set batch size
  • "-m" whether to use low-memory or high-memory setting. The high memory setting, loads all the images in memory, whereas the low-memory loads them batch by batch through ImageDataGenerator
  • more options -such as whether to perform data augmentation- are provided in the code interface but not through the command line

2.3. Evaluate trained model on Adience data

Use the argument evaluate when running the command:

$python gender_classifier.py evaluate -m "stored_model" -i "input_image_to_classify"

Example:

$python3.7 train_adience_gender.py evaluate -m trained_gender_classifier.h5 -i ../data/adience/combined
-Found 29437 images belonging to 2 classes.
-Found 3681 images belonging to 2 classes.
-evaluating on validation data...
-116/116 [==============================] - 985s 8s/step - loss: 0.2631 - accuracy: 0.9275
-evaluating on training data...
-920/920 [==============================] - 985s 8s/step - loss: 0.0897 - accuracy: 0.9670

3. Explanation

3.1. VGG-face

I have implemented the VGG-face model using TF2. The model implementation can be found in src/vgg_face.py file. The implementation is based on the architecture described in this paper.
The weights are also downloaded from the same webpage. After unzipping this file, there is a t7 file, containing the trained weights for the torch implementation that I use as input and convert to TF2 weights.

The main challenges in this step are:

  1. Although the paper calls the last few layers "fully connected", they are not dense layers. As the paper describes: "They are the same as a convolutional layer, but the size of the filters matches the size of the input data, such that each filter “senses” data from the entire image."
  2. Data format in the torch by default is channel-first, as opposed to TF/TF2 where it is channel-last. Additionally the images are trained in "BGR" order. The axis transpositions are important.

3.2. Adience Data

The data are downloaded from this link (provided in the email). After unzipping, there are two subdirectories: "aligned" and "valid". There is no overlap between these two subdirectories. I have used the "aligned" subdirectory for training, and "valid" for validation.

I have created a separate class for this data. To instantiate this class, one needs to provide the path to the data. The object will include the data and a data-generator that can later be used as input to fit() function of the model.

I have also provided two options for an object of this class: "low-memory" and "high-memory". By default it uses low-memory, and that's how I have ran the code on my laptop. I provided the "high-memory" option, since I noticed the image data are not too big, and given a reasonable computing resource could be possibly loaded in-memory all at once, facilitating the training.

I chose not to implement my own data generator and instead using the ImageDataGenerator from TF. This is because I realized I wouldn't be adding anything more than what TF ImageDataGenerator already provides.

3.3. Adience Model

This model is built on top of the VGG-face pretrained model. I use the bottom 30 layers of the vgg-face model and add 2 dense layers on top of them.

There are two arguments in the src/adience_model.py. The first one frozen specifies how many layers should have their weights frozen and not trainable. The other argument add_on_top specifies how many layers of the vgg-face model should be kept and add the new layers on top of them. Both of these have their default set to 30.