cnnAndBn/awesome_talking_face_generation

Awesome talking face generation

papers & codes

2022

title	-	paper	code	dataset	keywords
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model	SIGGRAPH (22)	paper		emotion
Expressive Talking Head Generation with Granular Audio-Visual Control	CVPR(22)	paper	-
Deep Learning for Visual Speech Analysis: A Survey	-	paper	-	-	survey
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN	-	paper	code	-	stylegan
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation	-	paper	code(coming soon)		NeRF
Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation	-	paper	-	-	-
One-shot talking face generation from single-speaker audio-visual correlation learning	AAAI(22)	paper	code	-	-
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory	AAAI(22)	paper(temp)	-	LRW, LRS2, BBC News	-
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering		paper			NeRF
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos		paper
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions		paper
DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation		paper
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion		paper

2021

title	-	paper	code	dataset
Parallel and High-Fidelity Text-to-Lip Generation		paper
[Survey]Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis	-	paper	-	-
FaceFormer: Speech-Driven 3D Facial Animation with Transformers	CVPR(22)	paper	code	-
Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices	-	paper	code	-
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning	ICCV	paper	code	-
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis	-	paper	code	-
Audio-Driven Emotional Video Portraits	CVPR	paper	code	MEAD, LRW
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization	CVPR	paper	-	-
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation	CVPR	paper	code	VoxCeleb2, LRW
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset	CVPR	paper	code	HDTF
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement	ICCV	paper	code(coming soon)	-
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis	ICCV	paper	code	-
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation	AAAI	paper	code(coming soon)	Mocap dataset
Visual Speech Enhancement Without A Real Visual Stream	-	paper	-	-
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary	-	paper	code	-
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion	IJCAI	paper	code	VoxCeleb, GRID, LRW
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head	-	paper	-	-
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person	-	paper	-	VoxCeleb2, Obama

2020

title	-	paper	code	dataset
[Survey]What comprises a good talking-head video generation?: A survey and benchmark	-	paper	code	-
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing	CVPR(21)	paper	code	-
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition	-	paper	code	CREMA-D
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild	ACMMM	paper	code	LRS2
Talking-head Generation with Rhythmic Head Motion	ECCV	paper	code	Crema, Grid, Voxceleb, Lrs3
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation	ECCV	paper	code	VoxCeleb2, AffectNet
Neural voice puppetry:Audio-driven facial reenactment	ECCV	paper	-	-
Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars	ECCV	paper	code	-
HeadGAN:Video-and-Audio-Driven Talking Head Synthesis	-	paper	-	VoxCeleb2
MakeItTalk: Speaker-Aware Talking Head Animation	-	paper	code, code	VoxCeleb2, VCTK
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose	-	paper	code	ImageNet, FaceWarehouse, LRW
Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks	-	paper	-	-
SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES	-	paper	-	LRW
Animating Face using Disentangled Audio Representations	WACV	paper	-
Everybody’s Talkin’: Let Me Talk as You Want	-	paper	-	-
Multimodal Inputs Driven Talking Face Generation With Spatial-Temporal Dependency	-	paper	-	-
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition	-	paper	-	-

2019

title	-	paper	code	dataset
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss	CVPR	paper	code	VGG Face, LRW

datasets

MEAD link
HDTF link
CREMA-D link
VoxCeleb link
LRS2 link
LRW link
GRID link
BIWI link
SAVEE link

metrics

PSNR (peak signal-to-noise ratio)
SSIM (structural similarity index measure)
LMD (landmark distance error)
LRA (lip-reading accuracy) -
FID (Fréchet inception distance)
LSE-D (Lip Sync Error - Distance)
LSE-C (Lip Sync Error - Confidence)
LPIPS (Learned Perceptual Image Patch Similarity) -
NIQE (Natural Image Quality Evaluator) -