/awesome-body-language

This repo is used for recording and tracking some Multi-modal Body Language researchs,In this work, we present the first detailed survey on Multi-modal Body Language research. We survey the research in 2 directions: Recognition and Generation;and 4 parts: Cued Speech, Co-speech, Sign Language, Talking Head.

Awesome PR's Welcome

A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

arXiv, 2023
Li Liu · Lufei Gao · Wentao Lei · Fengji Ma · Xiaotian Lin
Jinting Wang

arXiv PDF S-Lab Project Page


This repository is used for recording and tracking some Multi-modal Body Language researchs, as a supplement to our survey.
If you find any work missing or have any suggestions (papers, implementations and other resources), please don't hesitate to open an issue or pull request or just contact us by e-mail. We will check the problems and add the missing papers to this repo ASAP.

🔥News

[-2023.8.17 ] The first draft is on arxiv.

🔥Highlight!!

[1] We re-visit and group the existing Body Language researchs from the Multi-modal perspective.

[2] We survey the research in 4 parts: Cued Speech, Co-speech, Sign Language, Talking Head.

[3] We survey the research in 2 directions: Recognition and Generation.

[4] Some new insight for this directions are discussed.

Introduction

In this survey, we present the first detailed survey on Multi-modal Body Language research.

Alt Text

Summary of Contents

Paper-List

Cued Speech Recognition

Year Venue Acronym Paper Title Code/Project
2010 Speech Communication Heracleous et al. Cued speech automatic recognition in normal-hearing and deaf subjects N/A
2012 EUSIPCO Heracleous et al. Continuous phoneme recognition in Cued Speech for French N/A
2018 Interspeech Liu et al. Visual Recognition of Continuous Cued Speech Using a Tandem CNN-HMM Approach N/A
2020 IEEE Transactions on Multimedia Liu et al. Re-synchronization using the hand preceding model for multi-modal fusion in automatic continuous cued speech recognition N/A
2021 EUSIPCO Papadimitriou et al. A Fully Convolutional Sequence Learning Approach for Cued Speech Recognition from Videos N/A
2021 HCII Papadimitriou et al. Multimodal Fusion and Sequence Learning for Cued Speech Recognition from Videos N/A
2021 arXiv preprint Wang et al. Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition N/A
2021 arXiv preprint Wang et al. An Attention Self-supervised Contrastive Learning based Three-stage Model for Hand Shape Feature Representation in Cued Speech N/A
2022 ICASSP Sankar et al. Multistream Neural Architectures for Cued Speech Recognition Using a Pre-Trained Visual Feature Extractor and Constrained CTC Decoding N/A
2022 ISCSLP Liu et al. Objective Hand Complexity Comparison between Two Mandarin Chinese Cued Speech Systems N/A
2023 ICASSP Liu et al. Cross-Modal Mutual Learning for Cued Speech Recognition N/A

Co-speech Recognition

Year Venue Acronym Paper Title Code/Project
2014 MA3HMI Bhattacharya et al. Disposition Recognition from Spontaneous Speech Towards a Combination with Co-speech Gestures N/A
2021 ACM MM Böck et al. Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning N/A

Sign Language Recognition

Year Venue Acronym Paper Title Code/Project
2019 ICIP Zhang et al. Continuous Sign Language Recognition via Reinforcement Learning N/A
2020 ECAI Zhou et al. Self-Attention-based Fully-Inception Networks for Continuous Sign Language Recognition N/A
2020 ICASSP Li et al. Key Action and Joint CTC-Attention based Sign Language Recognition N/A
2020 ECCV Cheng et al. Fully Convolutional Networks for Continuous Sign Language Recognition N/A
2020 ECCV Niu et al. Stochastic Fine-Grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition N/A
2021 ICPR Koishybay et al. Continuous Sign Language Recognition with Iterative Spatiotemporal Fine-tuning N/A
2022 CVPR Zuo et al. C2SLR: Consistency-Enhanced Continuous Sign Language Recognition N/A
2022 IEEE Transactions on Multimedia Zhou et al. Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation N/A
2022 NeurIPS Chen et al. Two-Stream Network for Sign Language Recognition and Translation N/A
2023 CVPR Zheng et al. CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment Code
2023 TPAMI Bilge et al. Towards Zero-Shot Sign Language Recognition N/A
2023 AAAI Hu et al. Self-Emphasizing Network for Continuous Sign Language Recognition Code

Cued Speech Generation

Year Venue Acronym Paper Title Code/Project
1998 ISCA Paul et al. Automatic Generation of Cued Speech for The Deaf: Status and Outlook N/A
2008 AVSP G ́erard et al. Retargeting cued speech hand gestures for different talking heads and speakers N/A

Co Speech Generation

Year Venue Acronym Paper Title Code/Project
2015 IVA DCNF Predicting co-verbal gestures: A deep and temporal modeling approach N/A
2019 CVPR S2G Learning individual styles of conversational gesture Code
2020 EUROGRAPHICS StyleGestures Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows Code
2021 ICCV A2G Audio2Gestures: Generating Diverse Gestures from Speech Audio withConditional Variational Autoencoders Code
2021 IEEE VR Text2Gestures Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents Code
2022 Computer Graphics Forum ZeroEGGS ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech Code
2022 CVPR DiffGAN Low-Resource Adaptation for Personalized Co-Speech Gesture Generation N/A
2022 SIGGRAPH Asia RG Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings Code

Sign Language Generation

Year Venue Acronym Paper Title Code/Project
2016 Universal Access in the Information Society Sign3D Interactive editing in French Sign Language dedicated to virtual signers: requirements and challenges N/A
2018 AAAI DETR Hierarchical LSTM for Sign Language Translation N/A
2020 IJCV text2gesture Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks N/A
2020 CVPR ESN Everybody Sign Now:Translating Spoken Language to Photo Realistic Sign Language Video N/A
2020 BMVC Saunders et al. Adversarial Training for Multi-Channel Sign Language Production N/A
2022 ACL DSM Modeling Intensification for Sign Language Generation: A Computational Approach Code
2022 CVPR SignGAN Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production N/A
2023 CVPR PoseVQ-Diffusion Vector Quantized Diffusion Model with CodeUnet for Text-to-Sign Pose Sequences Generation Code

Talking Head Generation

Year Venue Acronym Paper Title Code/Project
2018 ECCV X2Face X2Face: A network for controlling face generation using images, audio, and pose codes N/A
2018 ECCV Chen et al. Lip Movements Generation at a Glance Code
2019 NeurIPS Wen et al. Face Reconstruction from Voice using Generative Adversarial Networks Code
2019 CVPR Speech2Face Speech2Face: Learning the Face Behind a Voice N/A
2019 ICASSP Wav2Pix WAV2PIX: Speech-conditioned Face Generation using Generative Adversarial Networks Code
2019 IJCV Jamaludin et al. You Said That?: Synthesising Talking Faces from Audio N/A
2019 IJCAI Song et al. Talking Face Generation by Conditional Recurrent Adversarial Network N/A
2019 AAAI Zhou et al. Talking face generation by adversarially disentangled audio-visual representation N/A
2019 CVPR Kefalas et al . End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs N/A
2020 ICCASP Kefalas et al . Speech-Driven Facial Animation Using Polynomial Fusion of Features N/A
2020 ICASSP Eskimez et al. End-To-End Generation of Talking Faces from Noisy Speech N/A
2020 IJCNN Sinha et al. Identity-Preserving Realistic Talking Face Generation N/A
2020 INTERSPEECH Wang et al. Speech Driven Talking Head Generation via Attentional Landmarks Based Representation N/A
2020 ACM MM Wav2lip A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild N/A
2020 arXiv preprint Yi et al. Audio-driven talking face video generation with learning-based personalized head pose N/A
2020 ECCV Chen et al. Talking-Head Generation with Rhythmic Head Motion Code
2020 WACV Mittal et al. Animating Face using Disentangled Audio Representations N/A
2020 ECCV MEAD MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation Code
2020 TVCG Wen et al. Photorealistic Audio-driven Video Portraits Code
2021 CVPR LipSync3D LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces From Video Using Pose and Lighting Normalization N/A
2021 The Visual Computer Fang et al. Facial expression GAN for voice-driven face generation N/A
2021 IJCAI Zhu et al. Arbitrary talking face generation via attentional audio-visual coherence learning N/A
2021 JCAI Audio2head Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion N/A
2021 ACM TOG Lu et al. Live speech portraits: real-time photorealistic talking-head animation N/A
2021 ICCV FACIAL FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning N/A
2021 ICCV AD-NeRF AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis Code
2021 CVPR MEAD Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset Code
2021 arXiv preprint Si et al. Speech2Video: Cross-Modal Distillation for Speech to Video Generation N/A
2021 arXiv preprint Chen et al. Talking Head Generation with Audio and Speech Related Facial Action Units N/A
2021 CVPR PC-AVS Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation Code
2022 CVPR GC-VAT Expressive Talking Head Generation With Granular Audio-Visual Control N/A
2022 AAAI Wang et al. One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning N/A
2022 ACM SIGGRAPH EAMM EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model N/A
2022 arXiv preprint SPACE SPACE: Speech-driven Portrait Animation with Controllable Expression N/A
2022 arXiv preprint DFA-NERF DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering N/A
2022 arXiv preprint Yu et al. Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors N/A
2022 ACCESS Bigioi et al. Pose-Aware Speech Driven Facial Landmark Animation Pipeline for Automated Dubbing N/A
2022 ECCV DFRF Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis Code
2022 ECCV SSP-NeRF Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation Code
2023 arXiv preprint DIRFA Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations N/A
2023 ICASSP DisCoHead DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions N/A
2023 ICASSP OPT OPT: One-shot Pose-Controllable Talking Head Generation N/A
2023 ICASSP Zhua et al. Audio-Driven Talking Head Video Generation with Diffusion Model N/A
2023 CVPR Wang et al. Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis N/A
2023 ICPADS Zhang et al. Talking Head Generation for Media Interaction System with Feature Disentanglement N/A
2023 CVPR SadTalker SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation Code
2023 CVPR DiffTalk DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation Code
2023 CoRR Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator N/A

Challenges

Year Task Language Name Link
2021 Sign Language Recognition English ChaLearn Looking at People link
2022 Sign Language Recognition, Translation & Production English SLRTP link
2023 Sign Language Recognition English Google - Isolated Sign Language Recognition link
2023 Sign Language Recognition Multiple WMT-SLT 23 link
2018 Lip Reading Recognition Japanese SSSD link
2022 Talking Head Generation English ViCo2022 link
2023 Talking Head Generation English ViCo2023 link
2020 Co-speech Generation English GENEA Challenge 2020 link
2022 Co-speech Generation English GENEA Challenge 2022 link
2023 Co-speech Generation English GENEA Challenge 2023 link

Acknowledgement

If you find our survey and repository useful for your research project, please consider citing our paper:

@article{liu2023blsurvey,
  title={A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation},
  author={Liu, Li and Lufei, Gao  and Wentao, Lei and Fengji, Ma and Xiaotian, Lin and Jinting, Wang },
  journal={arXiv:2308.08849},
  year={2023}
}

Contact

avrillliu@hkust-gz.edu.cn
wlei117@connect.hkust-gz.edu.cn