/JAANet

ECCV 2018 "Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment"

Primary LanguageC++

JAANet

This repository implements the training and testing of JAA-Net for "Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment". The repository offers the original implementation of the paper in Caffe

Getting Started

Dependencies

  • Dependencies for Caffe are required

  • The new implementations in the folders "src" and "include" should be merged into the official Caffe:

    • Add the .cpp, .cu files into "src/caffe/layers"
    • Add the .hpp files into "include/caffe/layers"
    • Add the content of "caffe.proto" into "src/caffe/proto"
  • New implementations used in our paper:

    • au_mask_based_land_layer: generate attention maps given the locations of landmarks
    • division_layer: divide a feature map into multiple identical subparts
    • combination_layer: combine mutiple sub feature maps
    • data_layer and data_transform_layer: the processing of landmarks in the case of mirroring faces is added
    • align_data_transform_layer: reset the order and change the coordinates for landmarks in the cases of mirroring and cropping
    • dice_coef_loss_layer: Dice coefficient loss
    • softmax_loss_layer: the weighting for the loss of each element is added
    • euclidean_loss_layer: the weighting for the loss of each element and the normalizing with inter-ocular distance are added
  • Build Caffe

Datasets

BP4D and DISFA

The 3-fold partitions of both BP4D and DISFA are provided in the folder "data"

Preprocessing

  • Prepare the training data
    • Run "prep/face_transform.cpp" to conduct similarity transformation for face images
    • Run "tools/convert_imageset" of Caffe to convert the images to leveldb or lmdb
    • Merge "tools/convert_data.cpp" into Caffe and use it to convert the landmark labels and weights to leveldb or lmdb
  • Modify the "model/BP4D_train_val.prototxt":
    • Modify the paths of data
    • A recommended training strategy is that selecting a small set of training data for validation to choose a proper maximum iterations and then using all the training data to retrain the model
    • The loss_weight for DiceCoefLoss of each AU is the normalized weight computed from the training data
    • The lr_mult for "au*_mask_conv3*" corresponds to the enhancement coefficient "\lambda_3", and the loss_weight of "au*_mask_loss" is related to the reconstruction constraint "E_r" and "\lambda_3"
   When \lambda_3 = 1:
   
    param {
      lr_mult: 1
      decay_mult: 1
    }
    param {
      lr_mult: 2
      decay_mult: 0
    }
    
    loss_weight: 1e-7
   When \lambda_3 = 2:
   
    param {
      lr_mult: 2
      decay_mult: 1
    }
    param {
      lr_mult: 4
      decay_mult: 0
    }
    
    loss_weight: 5e-8
  • There are two minor differences from the original paper:
    • Edge cropping of features and attention maps are removed for better generalization
    • The first convolution of the third block uses the stride of 2 instead of 1 for better performance

Training

cd model
sh train_net.sh

Citation

If you use this code for your research, please cite our paper.

@inproceedings{shao2018deep,
  title={Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment},
  author={Shao, Zhiwen and Liu, Zhilei and Cai, Jianfei and Ma, Lizhuang},
  booktitle={European Conference on Computer Vision},
  year={2018},
  pages={725--740},
  organization={Springer}
}

Updating

More details will be updated, and the Pytorch version will be made available soon

Acknowledgments

Code is partially inspired by DRML and A-Variation-of-Dice-coefficient-Loss-Caffe-Layer