faceAnalyzer

Authors

Description

The faceAnalyzer executable uses a slightly modified version of the OpenFace library for facial action unit recognition, face landmark detection, eye-gaze estimation and head pose estimation. The executable can process input webcam live as well as video or image files from the disk. faceAnalyzer is part of the ToMCAT project.

NOTE: We vendorize OpenFace under external/OpenFace since (i) it's not available using a package manager, and (ii) we have made some modifications (mainly ergonomic) to it to suit our purposes.

If you use Homebrew/MacPorts/apt as your package manager, we provide convenience scripts for installing faceAnalyzer - you can simply run the following commands to download and build it.

git clone https://github.com/ml4ai/tomcat-faceAnalyzer
cd tomcat-faceAnalyzer && ./tools/install

Usage

Navigate to the build/ directory in the tomcat-faceAnalyzer root directory and execute:

./bin/faceAnalyzer --mloc ../data/OpenFace_models

This will start processing the webcam live feed and output the facial features to the standard output in JSON format.

NOTE: Alternatively, you could edit your bashrc file to set the OpenFace models directory.

Command Line Arguments

One way of interacting with the faceAnalyzer executable is through the following command line arguments:

  -h [ --help ]             Show this help message
  --exp_id arg              Set experiment ID
  --trial_id arg            Set trial ID
  --playername arg          Set player name
  --mloc arg                Set OpenFace models directory
  --input_source arg (=0)   0 for webcam, 1 for nfs
  --output_source arg (=0)  0 for stdout, 1 for file (need to specify 
                            out_path), 2 for mqtt (need to specify message bus)
  --indent                  Indent output JSON by four spaces
  --visualize               Enable visualization
  -p [ --path ] arg (=null) Specify an input video/image file
  --emotion                 Display discrete emotion
  --out_path arg (=null)    Path for the output file if output_source is file
  --bus arg (=null)         Message bus to publish to if output_source is mqtt

NOTE: When the --visualize flag is set to true, the executable also outputs the visualization of facial landmarks, head pose and eye gaze tracking. To exit visualization and stop the processing of webcam/video, press the letter q or Q.

Example Usage

If you want to extract the facial features from a live webcam feed, set the experiment ID as 563e4567-e89b-12d3-a456-426655440000, set the trial ID as 123e4567-e89b-12d3-a456-426655440000, and display the discrete emotions for each timestamp, execute the following command on the command line:

./bin/faceAnalyzer --exp_id 563e4567-e89b-12d3-a456-426655440000\
                   --trial_id 123e4567-e89b-12d3-a456-426655440000\
                   --emotion

If you want to extract the facial features from a video file in the location ~/Downloads/video.mp4, set the player name as Aptiminer1, and enable visualization, execute the following command on the command line:

./bin/faceAnalyzer -f ~/Downloads/video.mp4 --playername Aptiminer1 --visualize

If you want to extract the facial features from a single image file in the location ~/Downloads/image.jpg, set the OpenFace models directory as ~/git_repos/tomcat/data/OpenFace_models, and enable indentation of JSON output by four spaces, execute the following command on the command line:

./bin/faceAnalyzer -f ~/Downloads/image.jpg --mloc ~/git_repos/tomcat/data/OpenFace_models --indent

Similarly, you can have other combinations.

Output Format

The faceAnalyzer executable uses the nlohmann-json library to output the action units (and the facial expressions, if specified through command line option --emotion), eye landmarks, gaze estimation and pose estimation values. The following is an example JSON message with indentation and emotion display enabled:

{
    "data": {
        "action_units": {
            "AU01": {
                "intensity": 1.5039452395072457,
                "occurrence": 1.0
            },
            "AU02": {
                "intensity": 0.7107745056044891,
                "occurrence": 1.0
            },
            ...
            "AU45": {
                "intensity": 0.7400846556287861,
                "occurrence": 0.0
            },
        },
        "emotions": [
            "sadness",
            "contempt"
        ],
        "frame": 1,
        "gaze": {
            "eye_0": {
                "x": -0.02601720206439495,
                "y": 0.2048162817955017,
                "z": -0.97845458984375
            },
            "eye_1": {
                "x": -0.1461271494626999,
                "y": 0.2099267840385437,
                "z": -0.9667355418205261
            },
            "eye_landmarks": {
                "2D": {
                    "x": [
                        297.0760498046875,
                        300.1932067871094,
                        ...
                    ],
                    "y": [
                        210.02487182617188,
                        202.84886169433594,
                        ...
                    ]
                },
                "3D": {
                    "x": [
                        -13.506591796875,
                        -11.667745590209961,
                        ...
                    ],
                    "y": [
                        -17.661083221435547,
                        -21.884918212890625,
                        ...
                    ],
                    "z": [
                        294.59564208984375,
                        294.53900146484375,
                        ...
                    ]
                }
            },
            "gaze_angle": {
                "x": -0.088267482817173,
                "y": 0.21006907522678375
            }
        },
        "landmark_detection_confidence": "0.97500",
        "landmark_detection_success": true,
        "playername": "Aptiminer1",
        "pose": {
            "location": {
                "x": 21.459043502807617,
                "y": 16.071529388427734,
                "z": 367.04388427734375
            },
            "rotation": {
                "x": 0.11796540021896362,
                "y": 0.036553021520376205,
                "z": 0.0021826198790222406
            }
        }
    },
    "header": {
        "message_type": "observation",
        "timestamp": "2020-08-01T12:25:47.626987Z",
        "version": "0.1"
    },
    "msg": {
        "experiment_id": "563e4567-e89b-12d3-a456-426655440000",
        "source": "faceAnalyzer",
        "sub_type": "state",
        "timestamp": "2020-08-01T12:25:47.626987Z",
        "trial_id": "123e4567-e89b-12d3-a456-426655440000",
        "version": "0.1"
    }
}

NOTE: This output is in accordance with output of the OpenFace executables (see https://github.com/TadasBaltrusaitis/OpenFace/wiki/Output-Format).

The explanation of each element in the data block is given below:

action_units

The sensor can detect the intensity (value ranges from 0 to 5) of 17 action units:

AU01_r, AU02_r, AU04_r, AU05_r, AU06_r, AU07_r, AU09_r, AU10_r, AU12_r, AU14_r, AU15_r, AU17_r, AU20_r, AU23_r, AU25_r, AU26_r, AU45_r

And the occurrence (0 represents absent, 1 represents present) of 18 action units:

AU01_c, AU02_c, AU04_c, AU05_c, AU06_c, AU07_c, AU09_c, AU10_c, AU12_c, AU14_c, AU15_c, AU17_c, AU20_c, AU23_c, AU25_c, AU26_c, AU28_c, AU45_c

emotion specifies a list of facial expressions displayed as a combination of action units

frame specifies the number of the frame (in case of sequences, ie, webcam and videos)

gaze

eye_0 specifies the eye gaze direction vector (xyz coordinates) for the leftmost eye in the frame

eye_1 specifies the eye gaze direction vector (xyz coordinates) for the rightmost eye in the frame

2D specifies the location of 2D eye region landmarks in pixels (x_0, ... x_55, y_0, ... y_55 coordinates)

3D specifies the location of 3D eye region landmarks in millimeters (x_0, ... x_55, y_0, ... y_55, z_0, ... z_55 coordinates)

gaze_angle specifies the eye gaze direction in radians (xy coordinates`) averaged for both the eyes

landmark_detection_confidence specifies how confident the tracker is in the current landmark detection estimate

landmark_detection_success specifies if tracking was successful

playername specifies the name of player

pose

location specifies the location of the head in millimeters (xyz coordinates) with respect to camera

rotation specifies the rotation of the head in radians (xyz coordinates) with camera being the origin

The explanation of each element in the header block is given below:

message_type specifies the type of output message

timestamp specifies the time of execution in ISO 8601 format

version specifies the version of faceAnalyzer

The explanation of each element in the msg block is given below:

experiment_id specifies the experiment ID

source specifies the source of output message

sub_type specifies the sub-type of output message

timestamp specifies the time of execution in ISO 8601 format

trial_id specifies the trial ID

version specifies the version of faceAnalyzer

FACS Emotion Classification

The FACS configuration employed to classify each emotion category (Friesen & Ekman, 1983) is described below:

Emotion	Action Units	Description
Happiness	6+12	Cheek raiser, Lip corner puller
Sadness	1+4+15	Inner brow raiser, Brow lowerer, Lip corner depressor
Surprise	1+2+5+26	Inner brow raiser, Outer brow raiser, Upper lid raiser, Jaw drop
Fear	1+2+4+5+7+20+26	Inner brow raiser, Outer brow raiser, Brow lowerer, Upper lid raiser, Lid tightener, Lip stretcher, Jaw drop
Anger	4+5+7+23	Brow lowerer, Upper lid raiser, Lid tightener, Lip tightener
Disgust	9+15+17	Nose wrinkler, Lip corner depressor, Chin raiser
Contempt	12+14	Lip corner puller, Dimpler

For more information, visit: https://en.wikipedia.org/wiki/Facial_Action_Coding_System

Limitations

When the AU prediction module of the OpenFace 2.0 toolkit was evaluated, it reportedly outperformed the more complex and recent baseline mathods - including IRKR, LT, CNN, D-CNN, and CCNF - on the DISFA dataset. The mean concordance correlation coefficient (CCC) across 12 AUs of OpenFace 2.0 was calculated to be 0.73 (Baltrusaitis et al., 2018). However, due to the qualified accuracy of OpenFace, the faceAnalyzer executable is expected to have some inherent limitations as well.
The emotion classification approach employed by the sensor assumes that instances of an emotion category are expressed with facial movements that vary, to some degree, around a prototypical set of movements. However, expressions of the same emotion category vary substantially across different situations, people, gender, and cultures (Barrett et al., 2019).
The faceAnalyzer executable outputs a list of all the emotions detected by the FACS configuration mentioned above. Due to shared characteristic AUs, there might be an overlap of emotions. A possible explanation for this limitation is that each timestamp captures the expression at a fixed point in time and the possibility that there may be an overlap in the sequence of facial changes (onset, apex, offset) associated with emotion categories is not taken into consideration (Kohler et al., 2004).

References

Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L. P. (2018, May). Openface 2.0: Facial behavior analysis toolkit. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) (pp. 59-66). IEEE.

Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M., & Pollak, S. D. (2019). Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological Science in the Public Interest, 20, 1–68. doi:10.1177/1529100619832930

Friesen, W. V., & Ekman, P. (1983). EMFACS-7: Emotional facial action coding system. Unpublished manuscript, University of California at San Francisco, 2(36), 1

Kohler, C. G., Turner, T., Stolar, N. M., Bilker, W. B., Brensinger, C. M., Gur, R. E., & Gur, R. C. (2004). Differences in facial expressions of four universal emotions. Psychiatry research, 128(3), 235-244.