Self-assigned project for visual analytics class at Aarhus University.
2021-05-19
This assignment is a self-assigned project. This project focuses on an emotion recognition task using transfer learning from a pre-trained VGG-Face deep CNN. The task is to classify images of facial expressions into 7 basic emotions: anger, disgust, fear, happiness, sadness, surprise, and neutral. The script outputs the classification report and a performance graph. After training and evaluating the CNN, the script, using grid search method, returns the best parameters based on highest achieved accuracy.
The problem of the task relates to classifying emotions. To address this problem, I have used transfer learning from a pre-trained VGG-Face deep CNN. VGG-Face is based on VGG-Very-Deep-16 CNN architecture and is evaluated on the Labeled Faces in the Wild (Huang et al., 2007) and the YouTube Faces (Wolf et al., 2011) dataset. The backbone of the VGG-Face CNN is a deep CNN with 18 weighted layers organized into five blocks. Each block contains 2 or 3 convolutional layers, followed by a max-pooling layer. The primary goal of this project is to learn the basics of transfer learning and CNNs instead of building the best performing model, therefore, to speed up the training process, the classifier layer was simplified by adding on top a new fully-connected classifier relevant to this project´s task. The three fully connected classifier layers of the original architecture: fc-4096, fc-4096 and fc-2622, were replaced by two smaller ones: fc-256 and fc-7 followed by a softmax activation function. The weights were updated during the training.
Depiction of the modified model´s architecture can be found in folder called 'output'.
File | Description |
---|---|
output | Folder containing files produced by the script |
output/Emotions_classifier_report.csv | Classification metrics of the model |
output/Emotions_classifier_performance.png | Model´s performance graph |
output/VGG-Face_CNN´s_architecture.png | Depiction of CNN model´s architecture used |
src | Folder containing the script |
src/emotion_class.py | The script |
README.md | Description of the assignment and the instructions |
emotion_venv.sh | bash file for creating a virtual environmment |
kill_emotion.sh | bash file for removing a virtual environment |
requirements.txt | list of python packages required to run the script |
For this project The Facial Expression Recognition 2013 (FER-2013) dataset was used. The dataset consists of 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centred and occupies about the same amount of space in each image. Images fall into seven categories: 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral. The training set consists of 28,709 examples and the test set consists of 7,178 examples.
Data structure
Before executing the code make sure that the images are located in the following path: 'data/Face_emotions'
'Face_emotions' folder should contain two folders: train and test, each of which contains seven folders labeled by an emotion. The code should work on any other similar image data structured this way, however the model parameters and preprocessing might require readjustments based on different data.
Data preprocessing
The only pre-processing of the data was the subtraction of the mean RGB value, computed on the training set, from each pixel. The same pre-processing step was applied to the ImageNet images for the VGG16 model, on which VGG-Face CNN is based on. This was performed using Keras function preprocess_input()
.
The code was tested on an HP computer with Windows 10 operating system. It was executed on Jupyter worker02.
Code parameters
Parameter | Description |
---|---|
train_data (trd) | Directory of training data |
val_data (vald) | Directory of validation data |
optimizer (optim) | A method to update the weight parameters to minimize the loss function. Choose betweeen SGD and Adam |
learning_rate (lr) | The amount that the weights are updated during training. Default = 0.001 |
epochs (ep) | Defines how many times the learning algorithm will work through the entire training dataset. Default = 50 |
Steps
Set-up:
#1 Open terminal on worker02 or locally
#2 Navigate to the environment where you want to clone this repository
#3 Clone the repository
$ git clone https://github.com/Rutatu/cds-visual_Self_assigned_project.git
#4 Navigate to the newly cloned repo
$ cd cds-visual_Self_assigned_project
#5 Create virtual environment with its dependencies and activate it
$ bash emotion_venv.sh
$ source ./emotion_venv/bin/activate
Run the code:
#6 Navigate to the directory of the script
$ cd src
#7 Run the code with default parameters
$ python emotion_class.py -trd ../data/Face_emotions/train -vald ../data/Face_emotions/test -optim Adam
#8 Run the code with self-chosen parameters
$ python emotion_class.py -trd ../data/Face_emotions/train -vald ../data/Face_emotions/test -optim SGD -lr 0.003 -ep 100
#9 To remove the newly created virtual environment
$ bash kill_emotion.sh
#10 To find out possible optional arguments for the script
$ python emotion_class.py --help
I hope it worked!
Facial expressions are very difficult to classify for the computer due to complicated muscle movements, therefore, deep learning techniques need to be employed in order to achieve at least mediocre results. This project showed how transfer learning can be used for an emotion classification problem based on extracted facial expression features. The pre-trained VGG-Face deep CNN (optimizer = Adam, learning rate = 0.001) achieved a weighted average accuracy of 41% for correctly classifying faces according to their emotional expression. Such results are not satisfactory, thus, having more data or/and fine-tuning hyperparameters of the model might increase the accuracy.
Brownlee, J. (2016, August 9). How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras. Machine Learning Mastery https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/
Huang, G. B., Ramesh,M., Berg, T., Learned-Miller, E. (2007) Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst
Parkhi, O. M., Vedaldi, A., and Zisserman, A. (2015). Deep Face Recognition (poster).
University of Oxford. [https://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/poster.pdf]
Serengil, S. (2018, August 6). Deep Face Recognition with Keras. Sefik Ilkin Serengil https://sefiks.com/2018/08/06/deep-face-recognition-with-keras/
Wolf, L., Hassner, T., Maoz, I. (2011). Face Recognition in Unconstrained Videos with Matched Background Similarity. Computer Vision and Pattern Recognition (CVPR)