[CV_FER] Facial Motion Prior Networks for Facial Expression Recognition

Question

jeonggg119 opened this issue a year ago · 0 comments

FMRN-FER : Facial Motion Prior Networks for Facial Expression Recognition

Facial-Motion Mask Generator (FMG)
- Generate a facial mask to focus on facial muscle moving regions
- Use avg differences bw neutral faces and expressive faces as training guidance (pseudo gt masks)
Prior Fusion Net (PFN)
- Generated mask is applied to and fused with original input expressive face
Classification Net (CN)
- Extract features and predict facial expression label (6 class)

CN : Inception V3 (pretrained on ImageNet)
5 landmarks are extracted, followed by face normalization
Image Transforms : Random crop from four corners or center & Random horizontal flip
Training (2 steps)
1. Starting by tuning only FMG for 300 epochs, using Adam optimizer
  - Epoch 150
  - LR linearly decay (FMG : e−4 to 0)
2. Jointly training entire framework with λ1 = 10 and λ2 = 1
  - Epoch 200
  - LR linearly decay (FMG : e−5, CN : e-4) from epoch 100
  - l_total = λ1 * l_G(MSE) + λ2 * l_C(CE) = 10 * l_G + l_C

MMI Facial Expression Database
- Labelled with 6 basic expressions (Disgust > Sadness > Happy > Fear > Surprise > Anger)
- 3 peak frames around center of each labelled sequence are selected → Total : 624 expressive faces
- 10-fold person-independent cross-validation experiments
- Details for MMI