/Awesome-Talking-Face

πŸ“– A curated list of resources dedicated to talking face.

MIT LicenseMIT

Awesome Talking Face Awesome

This is a repository for organizing papres, codes and other resources related to talking face/head. Most papers are linked to the pdf address provided by "arXiv" or "OpenAccess". However, some papers require an academic license to browse. For example, IEEE, springer, and elsevier journal, etc.

πŸ”† This project is still on-going, pull requests are welcomed!!

If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. Just letting me know the title of papers can also be a big contribution to me. You can do this by open issue or contact me directly via email.

⭐ If you find this repo useful, please star it!!!

2022.09 Update!

Thanks for PR from everybody! From now on, I'll occasionally include some papers about video-driven talking face generation. Because I found that the community is trying to include the video-driven methods into the talking face generation scope, though it is originally termed as Face Reenactment.

So, if you are looking for video-driven talking face generation, I would suggest you have a star here, and go to search Face Reenactment, you'll find more :)

One more thing, please correct me if you find that there are any paper noted as arXiv paper has been accepted to some conferences or journals.

2021.11 Update!

I updated a batch of papers that appeared in the past few months. In this repo, I was intend to cover the audio-driven talking face generation works. However, I found several text-based research works are also very interesting. So I included them here. Enjoy it!

TO DO LIST

  • Main paper list
  • Add paper link
  • Add codes if have
  • Add project page if have
  • Datasets and survey

Papers

Not indexed, recent

  • GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis Paper Project Page

2D Video - Person independent

  • Audio-Visual Face Reenactment [WACV 2023] Paper Project Page Code
  • Compressing Video Calls using Synthetic Talking Heads [BMVC 2022] Paper Project Page
  • Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement [arXiv 2022] Paper
  • StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation [arXiv 2022] Paper
  • Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [arXiv 2022] Paper
  • EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022] Paper
  • Talking Head from Speech Audio using a Pre-trained Image Generator [ACM MM 2022] Paper
  • Latent Image Animator: Learning to Animate Images via Latent Space Navigation [ICLR 2022] Paper ProjectPage(note this page has auto-play music...) Code
  • Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis [ECCV 2022] Paper ProjectPage Code
  • Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [ECCV 2022] Paper ProjectPage Code
  • Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [ICASSP 2022] Paper ProjectPage Code
  • StableFace: Analyzing and Improving Motion Stability for Talking Face Generation [arXiv 2022] Paper ProjectPage
  • Emotion-Controllable Generalized Talking Face Generation [IJCAI 2022] Paper
  • StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [arXiv 2022] Paper Code ProjectPage
  • DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [arXiv 2022] Paper
  • Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions [arXiv 2022] Paper
  • Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels [TMM 2022] Paper
  • Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper ProjectPage Code
  • Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning [CVPR 2022] Paper Code ProjectPage
  • Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper Code ProjectPage
  • Expressive Talking Head Generation with Granular Audio-Visual Control [CVPR 2022] Paper
  • Talking Face Generation with Multilingual TTS [CVPR 2022 Demo] Paper DemoPage
  • SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory [AAAI 2022] Paper
  • Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [SIGGRAPH Asia 2021] Paper Code
  • Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [ACMMM 2021] Paper Code
  • AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
  • FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [ICCV 2021] Paper Code
  • Learned Spatial Representations for Few-shot Talking-Head Synthesis [ICCV 2021] Paper
  • ⭐ Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [CVPR 2021] Paper Code ProjectPage
  • One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing [CVPR 2021] Paper
  • Audio-Driven Emotional Video Portraits [CVPR 2021] Paper Code
  • AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person [arXiv 2021] Paper
  • Talking Head Generation with Audio and Speech Related Facial Action Units [BMVC 2021] Paper
  • Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [IJCAI 2021] Paper
  • Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [AAAI 2021] Paper
  • Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [arXiv 2021] Paper Code
  • Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [arXiv 2020] Paper Code
  • A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild [ACMMM 2020] Paper Code
  • Talking Face Generation with Expression-Tailored Generative Adversarial Network [ACMMM 2020] Paper
  • Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [arXiv 2020] Paper Code
  • A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [ICPR 2020] Paper
  • Everybody's Talkin': Let Me Talk as You Want [arXiv 2020] Paper
  • HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [arXiv 2020] Paper
  • Talking-head Generation with Rhythmic Head Motion [ECCV 2020] Paper
  • Neural Voice Puppetry: Audio-driven Facial Reenactment [ECCV 2020] Paper Project Code
  • Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [CVPR 2020] Paper
  • Robust One Shot Audio to Video Generation [CVPRW 2020] Paper
  • MakeItTalk: Speaker-Aware Talking Head Animation [SIGGRAPH Asia 2020] Paper
  • FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis. [AAAI 2020] Paper
  • Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [AAAI 2020] Paper
  • Photorealistic Lip Sync with Adversarial Temporal Convolutional [arXiv 2020] Paper
  • SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES [arXiv 2020] Paper
  • Animating Face using Disentangled Audio Representations [WACV 2020] Paper
  • Realistic Speech-Driven Facial Animation with GANs. [IJCV 2019] Paper PorjectPage
  • Few-Shot Adversarial Learning of Realistic Neural Talking Head Models [ICCV 2019] Paper Code
  • Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [CVPR 2019] Paper Code
  • Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [AAAI 2019] Paper Code ProjectPage
  • Lip Movements Generation at a Glance [ECCV 2018] Paper
  • X2Face: A network for controlling face generation using images, audio, and pose codes [ECCV 2018] Paper Code ProjectPage
  • Talking Face Generation by Conditional Recurrent Adversarial Network [IJCAI 2019] Paper Code
  • Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [arXiv 2018] Paper
  • High-Resolution Talking Face Generation via Mutual Information Approximation [arXiv 2018] Paper
  • Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [arXiv 2018] Paper
  • You said that? [BMVC 2017] Paper

2D Video - Person dependent

  • Synthesizing Obama: Learning Lip Sync from Audio [SIGGRAPH 2017] Paper Project Page
  • PHOTOREALISTIC ADAPTATION AND INTERPOLATION OF FACIAL EXPRESSIONS USING HMMS AND AAMS FOR AUDIO-VISUAL SPEECH SYNTHESIS [ICIP 2017] Paper
  • HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks [Journal of Computer and Communications2017] Paper
  • ObamaNet: Photo-realistic lip-sync from text [arXiv 2017] Paper
  • A deep bidirectional LSTM approach for video-realistic talking head [Multimedia Tools Appl 2015] Paper
  • Photo-Realistic Expressive Text to Talking Head Synthesis [Interspeech 2013] Paper
  • PHOTO-REAL TALKING HEAD WITH DEEP BIDIRECTIONAL LSTM [ICASSP 2015] Paper
  • Expressive Speech-Driven Facial Animation [TOG 2005] Paper

3D Animation

  • Neural Emotion Director: Speech-preserving semantic control of facial expressions in β€œin-the-wild” videos [CVPR 2022] Paper Code
  • FaceFormer: Speech-Driven 3D Facial Animation with Transformers [CVPR 2022] Paper Code ProjectPage
  • LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization [CVPR 2021] Paper
  • MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [ICCV 2021] Paper
  • AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
  • 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [arXiv 2021] Paper
  • Modality Dropout for Improved Performance-driven Talking Faces [ICMI 2020] Paper
  • Audio- and Gaze-driven Facial Animation of Codec Avatars [arXiv 2020] Paper
  • Capture, Learning, and Synthesis of 3D Speaking Styles [CVPR 2019] Paper
  • VisemeNet: Audio-Driven Animator-Centric Speech Animation [TOG 2018] Paper
  • Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [TAC 2018] Paper
  • End-to-end Learning for 3D Facial Animation from Speech [ICMI 2018] Paper
  • Visual Speech Emotion Conversion using Deep Learning for 3D Talking Head [MMAC 2018]
  • A Deep Learning Approach for Generalized Speech Animation [SIGGRAPH 2017] Paper
  • Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TOG 2017] Paper
  • Speech-driven 3D Facial Animation with Implicit Emotional Awareness A Deep Learning Approach [CVPR 2017]
  • Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data [Interspeech 2016] Paper
  • Real-Time Speech-Driven Face Animation With Expressions Using Neural Networks [TONN 2012] Paper
  • Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar [SIST 2010] Paper

Datasets

  • TalkingHead-1KH Link
  • MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV 2020] ProjectPage
  • VoxCeleb Link
  • LRW Link
  • LRS2 Link
  • GRID Link
  • CREMA-D Link

Survey

  • Deep Learning for Visual Speech Analysis: A Survey [arXiv 2022] Paper
  • What comprises a good talking-head video generation?: A Survey and Benchmark [arXiv 2020] Paper