text-audio-articulatory-movement

We aim at predicting articualtory movements with both text and audio inputs.

To evaluate the results objectively, two articulatory animations are synthesized and compared. The first is the real articulatory animation generated by the 3D facial mesh model and the real articulatory movements. The second is the predicted articulatory animation generated by the 3D facial mesh model and the corresponding predicted articulatory movements