In this paper, I propose an architecture of Convolutional Neural Network (CNN) which can jointly learn representations for three tasks: smile detection, gender and age classification.
My model is based on the BKNet architecture. The method used hard parameter sharing, which to reduce the overfit of training separate task. The proposed network takes input from multiple data sources, data were flow through CNN Shared Block which learns joint representations for all tasks from all the sources of data. After the shared block, we separate network into three difference tasks. Each branch then learns task-specific features and has its own loss calculation method.
python demo.py -i image_path # estimate image
python demo.py -v video_path # estimate video
python demo.py # video stream
- python 3.7+
- tensorflow
- numpy
- opencv3.x
- MTCNN for face detection
Firstly, I prepare the training data by merge three datasets. I try to keep the number of training data for each task equally to help have the same impact of each dataset on the model
Run trainning.ipynb
. Change your datasets folder links, training parameters in config.py
.
Run testing.ipynb
to see result on the test datasets
Branch | Train | Test |
---|---|---|
AGE |
66.06% | 61.36% |
GENDER |
97.01% | 93.58% |
SMILE |
99.53% | 92.80% |
I also build age-gender-estimation based on efficientnets. If you want, please see age-gender-estimation
- Effective Deep Multi-source Multi-task Learning Frameworks for SmileDetection, Emotion Recognition and Gender Classification
- Dinh Viet Sang, Le Tran Bao Cuong, Pham Thai Ha, Multi-task learning for smile detection, emotion recognition and gender classification, December 2017