/ARL

IEEE Transactions on Affective Computing "Facial Action Unit Detection Using Attention and Relation Learning"

Primary LanguageC++

ARL

This repository is the implementation of our paper "Facial Action Unit Detection Using Attention and Relation Learning". The code is mainly borrowed from JAA-Net.

Getting Started

Dependencies

  • Dependencies for Caffe are required

  • The new implementations in the folders "src" and "include" should be merged into the official Caffe:

    • Add the .cpp, .cu files into "src/caffe/layers", except "modified_permutohedral.cpp" should be moved into "src/caffe/util/"
    • Add the .hpp files into "include/caffe/layers", except "modified_permutohedral.hpp" and "tvg_util.hpp" should be moved into "include/caffe/util/"
    • Add the content of "caffe.proto" into "src/caffe/proto"
    • Add "tools/convert_data.cpp" into "tools"
  • New implementations used in our paper:

    • division_layer: divide a feature map into multiple identical subparts
    • combination_layer: combine mutiple sub feature maps
    • multi_stage_meanfield_au3 and meanfield_iteration: fully-connected conditional random field
    • lp_norm_layer and cosine_similarity_loss_layer: cosine similarity loss for AU intensity estimation
    • sigmoid_cross_entropy_loss_layer: the weighting for the loss of each element is added
    • euclidean_loss_layer: used for AU intensity estimation: weighting the loss of each element, and setting the gradient as zero when the corresponding AU label is missing.
    • convert_data: convert the AU labels and weights to leveldb or lmdb
  • Build Caffe

Datasets

BP4D and DISFA

The 3-fold partitions of both BP4D and DISFA can be found here

Preprocessing

  • Prepare the training data
    • Run "prep/face_transform.cpp" to conduct similarity transformation for face images
    • Run "prep/combine2parts.m" to combine two partitions as a training set, respectively
    • Run "prep/write_AU_weight.m" to compute the weight of each AU for the training set
    • Run "tools/convert_imageset" of Caffe to convert the images to leveldb or lmdb
    • Run "tools/convert_data" to convert the AU labels and weights to leveldb or lmdb: the weights are shared by all the training samples (only one line needed)
    • Our method is evaluated by 3-fold cross validation. For example, “BP4D_combine_1_2” denotes the combination of partition 1 and partition 2
  • Modify the train_val prototxt files:
    • Modify the paths of data
    • A recommended training strategy is that selecting a small set of training data for validation to choose a proper maximum iterations and then using all the training data to retrain the model

Training

  • AU detection
cd model
sh train_net.sh
  • AU intensity estimation
sh train_net_intensity.sh
  • Trained models on BP4D with 3-fold cross-validation for AU detection and on FERA 2015 for AU intensity estimation can be downloaded here

Testing

  • AU detection
python test.py
  • AU intensity estimation
python test_intensity.py
matlab -nodisplay
>> evaluate_intensity
  • Visualize attention maps
python visualize_attention_map.py

Citation

If you use this code for your research, please cite our paper

@article{shao2019facial,
  title={Facial action unit detection using attention and relation learning},
  author={Shao, Zhiwen and Liu, Zhilei and Cai, Jianfei and Wu, Yunsheng and Ma, Lizhuang},
  journal={IEEE Transactions on Affective Computing},
  year={2019},
  publisher={IEEE}
}