Awesome-Talking-Head-Synthesis

Awesome-Talking-Head-Synthesis

This repository organizes papers, codes and resources related to generative adversarial networks (GANs) 🤗 and neural radiance fields (NeRF) 🎨, with a main focus on image-driven and audio-driven talking head synthesis papers and released codes. 👤

Papers for Talking Head Synthesis, released codes collections. ✍️

Most papers are linked to PDFs on "arXiv" or journal/conference websites 📚. However, some papers require an academic license to view 🔐.

🔆 This project Awesome-Talking-Head-Synthesis is ongoing - pull requests are welcome! If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and submit a PR. You can also open an issue or contact me directly via email. 📩

⭐ If you find this repo useful, please give it a star! 🤩

2023.12 Update 📆

Thank you to https://github.com/Curated-Awesome-Lists/awesome-ai-talking-heads, I have added some of its contents, such as Tools & Software and Slides & Presentations. 🙏 I hope this will be helpful.😊

If you have any feedback or ideas on extending this aggregated resource, please open an issue or PR - community contributions are vital to advancing this shared knowledge. 🤝

Let's keep pushing forward to recreate ever more realistic digital human faces! 💪 We've come so far but still have a long way to go. With continued research 🔬 and collaboration, I'm sure we'll get there! 🤗

Please feel free to star ⭐ and share this repo if you find it a valuable resource. Your support helps motivate me to keep maintaining and improving it. 🥰 Let me know if you have any other questions!

Datasets

Dataset	Download Link	Description
Faceforensics++	Download link
CelebV	Download link
VoxCeleb	Download link	`VoxCeleb`, a comprehensive audio-visual dataset for speaker recognition, encompasses both VoxCeleb1 and VoxCeleb2 datasets.
VoxCeleb1	Download link	`VoxCeleb1` contains over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube.
VoxCeleb2	Download link	Extracted from YouTube videos, VoxCeleb2 includes video URLs and discourse timestamps. As the largest public audio-visual dataset, it is primarily used for speaker recognition tasks. However, it can also be utilized for training talking-head generation models. To obtain download permission and access the dataset, apply here. Requires 300 GB+ storage space.
ObamaSet	Download link	`ObamaSet` is a specialized audio-visual dataset focused on analyzing the visual speech of former US President Barack Obama. All video samples are collected from his weekly address footage. Unlike previous datasets, it exclusively centers on Barack Obama and does not provide any human annotations.
TalkingHead-1KH	Download link	The dataset consists of 500k video clips, of which about 80k are greater than 512x512 resolution. Only videos under permissive licenses are included. Note that the number of videos differ from that in the original paper because a more robust preprocessing script was used to split the videos.
LRW (Lip Reading in the Wild)	Download link	LRW, a diverse English-speaking video dataset from the BBC program, features over 1000 speakers with various speaking styles and head poses. Each video is 1.16 seconds long (29 frames) and involves the target word along with context.
MEAD 2020	Download link	MEAD 2020 is a Talking Head dataset annotated with emotion labels and intensity labels. The dataset focuses on facial generation for natural emotional speech, covering eight different emotions on three intensity levels.
CelebV-HQ	Download link	CelebV-HQ is a high-quality video dataset comprising 35,666 clips with a resolution of at least 512x512. It includes 15,653 identities, and each clip is manually labeled with 83 facial attributes, spanning appearance, action, and emotion. The dataset's diversity and temporal coherence make it a valuable resource for tasks like unconditional video generation and video facial attribute editing.
HDTF	Download link	HDTF, the High-definition Talking-Face Dataset, is a large in-the-wild high-resolution audio-visual dataset consisting of approximately 362 different videos totaling 15.8 hours. Original video resolutions are 720 P or 1080 P, and each cropped video is resized to 512 × 512.
CREMA-D	Download link	CREMA-D is a diverse dataset with 7,442 original clips featuring 91 actors, including 48 male and 43 female actors aged 20 to 74, representing various races and ethnicities. The dataset includes recordings of actors speaking from a set of 12 sentences, expressing six different emotions (Anger, Disgust, Fear, Happy, Neutral, and Sad) at four emotion levels (Low, Medium, High, and Unspecified). Emotion and intensity ratings were gathered through crowd-sourcing, with 2,443 participants rating 90 unique clips each (30 audio, 30 visual, and 30 audio-visual). Over 95% of the clips have more than 7 ratings. For additional details on CREMA-D, refer to the paper link.
LRS2	Download link	LRS2 is a lip reading dataset that includes videos recorded in diverse settings, suitable for studying lip reading and visual speech recognition.
GRID	Download link	The GRID dataset was recorded in a laboratory setting with 34 volunteers, each speaking 1000 phrases, totaling 34,000 utterance instances. Phrases follow specific rules, with six words randomly selected from six categories: "command," "color," "preposition," "letter," "number," and "adverb." Access the dataset here.
SAVEE	Download link	The SAVEE (Surrey Audio-Visual Expressed Emotion) database is a crucial component for developing an automatic emotion recognition system. It features recordings from 4 male actors expressing 7 different emotions, totaling 480 British English utterances. These sentences, selected from the standard TIMIT corpus, are phonetically balanced for each emotion. Recorded in a high-quality visual media lab, the data undergoes processing and labeling. Performance evaluation involves 10 subjects rating recordings under audio, visual, and audio-visual conditions. Classification systems for each modality achieve speaker-independent recognition rates of 61%, 65%, and 84% for audio, visual, and audio-visual, respectively.
BIWI(3D)	Download link	The Biwi 3D Audiovisual Corpus of Affective Communication serves as a compromise between data authenticity and quality, acquired at ETHZ in collaboration with SYNVO GmbH.
VOCA	Download link	VOCA is a 4D-face dataset with approximately 29 minutes of 4D face scans and synchronized audio from 12-bit speakers. It greatly facilitates research in 3D VSG.
Multiface(3D)	Download link	The Multiface Dataset consists of high-quality multi-view video recordings of 13 people displaying various facial expressions. It contains approximately 12,200 to 23,000 frames per subject, captured at 30 fps from around 40 to 160 camera views with uniform lighting. The dataset's size is 65TB and includes raw images (2048x1334 resolution), tracked and meshed heads, 1024x1024 unwrapped face textures, camera calibration metadata, and audio. This repository provides code for downloading the dataset and building a codec avatar using a deep appearance model.
MMFace4D	Download link	The MMFace4D dataset is a large-scale multi-modal dataset for audio-driven 3D facial animation research. It contains over 35,000 sequences captured from 431 subjects ranging in age from 15 to 68 years old. Various sentences from scenarios such as news broadcasting, conversations and storytelling were recorded, totaling around 11,000 utterances. High-fidelity data was captured using three synchronized RGB-D cameras to obtain high-resolution 3D meshes and textures. A reconstruction pipeline was developed to fuse the multi-view data and generate topology-consistent 3D mesh sequences. In addition to the 3D facial motions, synchronized speech audio is also provided. The final dataset covers a wide range of expressive talking styles and facial expressions through a diverse set of subjects and utterances. With its large scale, high quality of data and strong diversity, the MMFace4D dataset provides an ideal benchmark for developing and evaluating audio-driven 3D facial animation models.

Survey

Year	Title	Conference/Journal
2024	Deepfake Generation and Detection: A Benchmark and Survey Github	arXiv 2024
2024	A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos Code	arXiv 2024
2024	How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey 3DGS+SLAM🔥🔥🔥	arXiv 2024
2024	3D Gaussian as a New Vision Era: A Survey 3DGS🔥🔥🔥	arXiv 2024
2024	Advances in 3D Generation: A Survey	arXiv 2024
2024	A Survey on 3D Gaussian Splatting 3DGS🔥🔥🔥on going	arXiv 2024
2024	Neural Radiance Fields: Past, Present, and Future NeRF🔥🔥🔥 Amazing 413 pages	arXiv 2024
2023	From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications	arXiv 2023
2023	Human-Computer Interaction System: A Survey of Talking-Head Generation	IEEE
2023	Talking human face generation: A survey	ACM
2022	Deep Learning for Visual Speech Analysis: A Survey	arXiv 2022
2020	What comprises a good talking-head video generation?: A Survey and Benchmark	arXiv 2020

Funny Work

Year	Title	Code	Project	Keywords
2024	[Audio2Photoreal] From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations	Code	Project	Photoreal
2024	[Animate Anyone] Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation	Code	Project	🔥Animate (阿里科目三驱动)
2024	[3DGAN] What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs		Project	🔥Nvidia

Audio-driven

Year	Title	Conference/Journal	Code	Project	Keywords
2024	[EDTalk] EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis	Arix 2024	Code	Project
2024	[FaceChain-ImagineID] FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio	Arix 2024	Code
2024	[Talk3D] Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior	Arix 2024	Code	Project
2024	[AniPortrait] AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation	Arix 2024	Code		🔥🔥🔥Similar to EMO
2024	[Make-Your-Anchor] Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework	CVPR 2024	Code
2024	Adaptive Super Resolution For One-Shot Talking-Head Generation	ICASSP 2024	Code
2024	[VLOGGER] VLOGGER: Multimodal Diffusion for Embodied	Arix 2024		Project	Embodied
2024	[EmoVOCA] EmoVOCA: Speech-Driven Emotional 3D Talking Heads	Arix 2024			3D, VOCA
2024	[ScanTalk] ScanTalk: 3D Talking Heads from Unregistered Scans	Arix 2024			3D
2024	[Style2Talker] Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style	Arix 2024
2024	[EMO] EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions	Arix 2024	Code	Project	🔥🔥🔥Amazing, Diffusion
2024	[G4G] G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment	Arix 2024			A Generic Framework
2024	Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis	CVPR 2024			High-Quality
2024	[DiffSpeaker] DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer	Arix 2024	Code		3D
2024	[EmoSpeaker] EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation	Arix 2024	Code	Project	Emotion
2024	[NeRF-AD] NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis	ICASSP 2024	Code	Project	AU
2024	[Real3D-Portrait] Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis	ICLR 2024	Code	Project	3D, One-Shot,Realistic
2024	[SyncTalk] SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis	CVPR 2024	Code	Project	😈Talking Head
2024	[AdaMesh] AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation	Arix 2024	Code	Project	3D,Mesh
2024	[DREAM-Talk] DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation	Arix 2024		Project	Emotion
2024	[AE-NeRF] AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis	AAAI 2024
2024	[R2-Talker] R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning	Arxiv 2024	Code		based-RAD-NeRF
2024	[DT-NeRF] DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis	ICASSP 2024	-	-	ER-NeRF
2023	[ER-NeRF] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis	ICCV 2023	Code	Project	Tri-plane
2023	[LipNeRF] LipNeRF: What is the right feature space to lip-sync a NeRF?	FG 2023	Code	Project	Wav2lip
2024	[VectorTalker] VectorTalker: SVG Talking Face Generation with Progressive Vectorisation	Arix 2024			SVG
2024	[Mimic] Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation	AAAI 2024			3D
2024	[DreamTalk] DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models	Arix 2024	Code	Project	Diffusion
2024	[FaceTalk] FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models	Arix 2024	Code	Project
2024	[GSmoothFace] GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance	Arix 2024			3D
2024	[GMTalker] GMTalker: Gaussian Mixture based Emotional talking video Portraits	Arix 2024		Project	Emotion
2024	[VividTalk] VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior	Arix 2024			Mesh
2024	[GAIA] GAIA: Zero-shot Talking Avatar Generation	Arix 2024	Code(coming)	Project	😲😲😲
2023	Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation	ICCV 2023	Code	Project	-
2023	[ToonTalker] ToonTalker: Cross-Domain Face Reenactment	ICCV 2023	-	-	-
2023	Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation	ICCV 2023	Code	Project	-
2023	[EMMN] EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation	ICCV 2023	-	-	Emotion
2023	Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation	ICCV 2023	-	-	Emotion,LHG
2023	[MODA] MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions	ICCV 2023	-	-	-
2023	[Facediffuser] Facediffuser: Speech-driven 3d facial animation synthesis using diffusion	ACM SIGGRAPH MIG 2023	Code	Project	🔥Diffusion,3D
2023	Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis	TCSVT 2023	-	-
2023	[SadTalker] SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation	CVPR 2023	Code	Project	3D,Single Image
2023	[EmoTalk] EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation	ICCV 2023	Code		3D,Emotion
2023	Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks	InterSpeech 2023			Emotion
2023	[DINet] DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video	AAAI 2023	Code	-
2023	[StyleTalk] StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles	AAAI 2023	Code	-	Style
2023	High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning	CVPR 2023	-	-	Emotion
2023	[StyleSync] StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator	CVPR 2023	Code	Project	-
2023	[TalkLip] TalkLip: Seeing What You Said - Talking Face Generation Guided by a Lip Reading Expert	CVPR 2023	Code	-	-
2023	[CodeTalker] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior	CVPR 2023	Code	Project	3D,codebook
2023	[EmoGen] Emotionally Enhanced Talking Face Generation	Arxiv 2023	Code	-	Emotion
2023	[DAE-Talker] DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder	Arxiv 2023	-	Project	🔥Diffusion
2023	[READ] [READ Avatars: Realistic Emotion-controllable Audio Driven Avatars](READ Avatars: Realistic Emotion-controllable Audio Driven Avatars)	Arxiv 2023	-	-	-
2023	[DiffTalk] DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis	CVPR 2023	Code	Project	🔥Diffusion
2023	[Diffused Heads] Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation	Arxiv 2023	-	Project	🔥Diffusion
2022	VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild	SIGGRAPH 2022	Code	Project
2022	[MemFace] Expressive Talking Head Generation with Granular Audio-Visual Control	CVPR 2022	-	-	-
2022	Talking Face Generation with Multilingual TTS	CVPR 2022	Demo Track	-	-
2022	[EAMM] EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model	SIGGRAPH 2022	-	-	Emotion
2022	[SPACEx] SPACEx 🚀: Speech-driven Portrait Animation with Controllable Expression	arXiv 2022	-	Project	-
2022	[AV-CAT] Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers	SIGGRAPH Asia 2022	-	-	-
2022	[MemFace] Memories are One-to-Many Mapping Alleviators in Talking Face Generation	arXiv 2022	-	-	-
2021	[PC-AVS] PC-AVS: Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation	CVPR 2021	Code	Project	-
2021	[IATS] Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis	ACM MM 2021	-	-	-
2021	[Speech2Talking-Face] Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation	IJCAI 2021	-	-	-
2021	[FAU] Talking Head Generation with Audio and Speech Related Facial Action Units	BMVC 2021	-	-	AU
2021	[EVP] Audio-Driven Emotional Video Portraits	CVPR 2021	Code	-	Emotion
2021	[IATS] IATS: Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis	ACM Multimedia 2021	-	-	-
2020	[Wav2Lip] A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild	ACM Multimedia 2020	Code	Project	-
2020	[RhythmicHead] Talking-head Generation with Rhythmic Head Motion	ECCV 2020	Code	-	-
2020	[MakeItTalk] Speaker-Aware Talking-Head Animation	SIGGRAPH Asia 2020	Code	Project	-
2020	[Neural Voice Puppetry] Audio-driven Facial Reenactment	ECCV 2020	-	Project	-
2020	[MEAD] A Large-scale Audio-visual Dataset for Emotional Talking-face Generation	ECCV 2020	Code	Project	-
2020	Realistic Speech-Driven Facial Animation with GANs	IJCV 2020	-	-	-
2019	[DAVS] Talking Face Generation by Adversarially Disentangled Audio-Visual Representation	AAAI 2019	Code	-	-
2019	[ATVGnet] Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss	CVPR 2019	Code	-	-
2018	Lip Movements Generation at a Glance	ECCV 2018	Code	-	-
2018	[VisemeNet] Audio-Driven Animator-Centric Speech Animation	SIGGRAPH 2018	-	-	-
2017	[Synthesizing Obama] Learning Lip Sync From Audio	SIGGRAPH 2017	-	Project	-
2017	[You Said That?] Synthesising Talking Faces From Audio	BMVC 2019	Code	-	-
2017	Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion	SIGGRAPH 2017	-	-	-
2017	A Deep Learning Approach for Generalized Speech Animation	SIGGRAPH 2017	-	-	-
2016	[LRW] Lip Reading in the Wild	ACCV 2016	-	-	-

Text-driven

Year	Title	Conference/Journal	Code/Proj
2023	TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles	Arxiv
2021	Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation	AAAI	Code
2021	Txt2vid: Ultra-low bitrate compression of talking-head videos via text	Arxiv	Code

NeRF & 3D & Gaussian Splatting

Year	Title	Conference/Journal	Code	Project	Keywords
2024	[GeneAvatar] GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image	CVPR 2024	Code	Project	Editing
2024	Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes	CVPR 2024		Project	Blendshapes
2024	[MagicMirror] MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space	Arxiv 2024		Project
2024	[HAHA] HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior	Arxiv 2024			🔥Gaussian Splatting
2024	[UV Gaussians] UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling	Arxiv 2024		Project	🔥Gaussian Splatting
2024	[NECA] NECA: Neural Customizable Human Avatar	CVPR 2024	Code
2024	[V3D] V3D: Video Diffusion Models are Effective 3D Generators	Arxiv 2024	Code	Project	🔥Gaussian Splatting, Video
2024	[DNGaussian] DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization	CVPR 2024	Code	Project	🔥Gaussian Splatting, Sparse-View
2024	[GEA] GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video	Arxiv 2024		Project	🔥Gaussian Splatting, Avatar
2024	[Magic-Me] Magic-Me: Identity-Specific Video Customized Diffusion	Arxiv 2024	Code	Project
2024	[HeadStudio] HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting	Arxiv 2024			🔥Gaussian Splatting, Avatar
2024	[GaussianHair] GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians	Arxiv 2024			🔥Gaussian Splatting
2024	[ImplicitDeepfake] ImplicitDeepfake: Plausible Face-Swapping through Implicit Deepfake Generation using NeRF and Gaussian Splatting	Arxiv 2024			🔥Gaussian Splatting, Deepfake
2024	Consolidating Attention Features for Multi-view Image Editing	Arxiv 2024			🔥Gaussian Splatting, Edit
2024	[Rig3DGS] Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos	Arxiv 2024		Project	Portraits
2024	[4D Gaussian Splatting] 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes	Arxiv 2024			Dynamic Scenes
2024	[ViCA-NeRF] ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields	NIPS 2023	Code	Project	3D Edit
2024	[CoSSegGaussians] CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion	Arxiv 2024		Project	Segmentic
2024	[Sketch2NeRF] Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation	Arxiv 2024			Text to 3D
2024	[CoSSegGaussians] CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion	Arxiv 2024		Project	🔥Gaussian Splatting, Segmentation
2024	[UltrAvatar] UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures	Arxiv 2024		Project	Diffusion,Avatar
2024	[GaussianBody] GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting	Arxiv 2024			🔥Gaussian Splatting
2024	[FED-NeRF] FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF	Arxiv 2024	Code		4D face video editor
2024	[AGG] AGG: Amortized Generative 3D Gaussians for Single Image to 3D	Arxiv 2024		Project	🔥Gaussian Splatting
2024	Gaussian Shadow Casting for Neural Characters	Arxiv 2024			🔥Gaussian Splatting
2024	[Human101] Human101: Training 100+FPS Human Gaussians in 100s from 1 View	Arxiv 2024	Code	Project	🔥Gaussian Splatting
2024	Deformable 3D Gaussian Splatting for Animatable Human Avatars	Arxiv 2024			🔥Gaussian Splatting
2024	[4DGen] 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency	Arxiv 2024	Code	Project	🔥Gaussian Splatting
2024	[3DGAN] What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs	Arxiv 2024		Project
2024	[3DGS-Avatar] 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting	Arxiv 2024	Code	Project	🔥Gaussian Splatting
2024	Learning Dense Correspondence for NeRF-Based Face Reenactment	AAAI 2024			one-shot multi-view face reenactmen
2024	[GaussianHead] GaussianHead: Impressive 3D Gaussian-based Head Avatars with Dynamic Hybrid Neural Field	Arxiv 2024	Code		🔥Gaussian Splatting
2024	[MonoGaussianAvatar] MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar	Arxiv 2024			🔥Gaussian Splatting
2024	[Gaussian Head Avatar] Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians	Arxiv 2024	Code	Project
2024	[HeadGaS] HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting	Arxiv 2024			🔥Gaussian Splatting
2024	[GaussianAvatars] GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians	CVPR 2024	Code	Project	🔥Gaussian Splatting
2023	[SD-NeRF] SD-NeRF: Towards Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFs	IEEE 2023	-	-
2023	[Instruct-NeuralTalker] Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions	Arxiv 2023
2023	[GeneFace++] Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation	Arxiv 2023	Code	Project	-
2023	[GeneFace] GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis	ICLR 2023	Code	Project	-
2022	[RAD-NeRF] RAD-NeRF: Real-time Neural Talking Portrait Synthesis	Arxiv 2022	Code	Project	InstantNGP
2022	[DFRF] DFRF：Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis	ECCV 2022	Code	Project
2022	[DialogueNeRF] DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation	Arxiv 2022	-	-	-
2022	[NeRFInvertor] NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation	Arxiv 2022	Code	Project	-
2022	[Next3D] Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars	Arxiv 2022	Code	Project	-
2022	[3DFaceShop] 3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation	Arxiv 2022	Code	Project	-
2022	[FNeVR] FNeVR: Neural Volume Rendering for Face Animation	Arxiv 2022	Code	-	-
2022	[ROME] ROME: Realistic One-shot Mesh-based Head Avatars	ECCV 2022	Code	Project	-
2022	[IMavatar] IMavatar: Implicit Morphable Head Avatars from Videos	CVPR 2022	Code	Project	-
2022	[HeadNeRF] HeadNeRF: A Real-time NeRF-based Parametric Head Model	CVPR 2022	Code	Project	-
2022	[SSP-NeRF] Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation	Arxiv 2022	Code	Project	-
2021	[AD-NeRF] AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis	ICCV 2021	Code	Project	-
2021	[NerFACE] NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction	CVPR 2021 Oral	Code	Project	-
2021	[DFA-NeRF] DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering	Arxiv 2021	Code	-	-

Metrics

Metrics	Paper	Link
PSNR (peak signal-to-noise ratio)	-
SSIM (structural similarity index measure)	Image quality assessment: from error visibility to structural similarity.
CPBD(cumulative probability of blur detection)	A no-reference image blur metric based on the cumulative probability of blur detection
LPIPS (Learned Perceptual Image Patch Similarity) -	The Unreasonable Effectiveness of Deep Features as a Perceptual Metric	paper
NIQE (Natural Image Quality Evaluator)	Making a ‘Completely Blind’ Image Quality Analyzer	paper
FID (Fréchet inception distance)	GANs trained by a two time-scale update rule converge to a local nash equilibrium
LMD (landmark distance error)	Lip Movements Generation at a Glance
LRA (lip-reading accuracy)	Talking Face Generation by Conditional Recurrent Adversarial Network	paper
WER(word error rate)	Lipnet: end-to-end sentencelevel lipreading.
LSE-D (Lip Sync Error - Distance)	Out of time: automated lip sync in the wild
LSE-C (Lip Sync Error - Confidence)	Out of time: automated lip sync in the wild
ACD(Average content distance)	Facenet: a unified embedding for face recognition and clustering.
CSIM(cosine similarity)	Arcface: additive angular margin loss for deep face recognition.
EAR(eye aspect ratio)	Real-time eye blink detection using facial landmarks. In: Computer Vision Winter Workshop
ESD(emotion similarity distance)	What comprises a good talking-head video generation?: A Survey and Benchmark

Tools & Software

Tool/Resource	Description
LUCIA	Development of a MPEG-4 Talking Head Engine. 💻
Yepic Studio	Create and dub talking head-style videos in minutes without expensive equipment. 🎥
Mel McGee's Talkbots	A complete multi-browser, multi-platform talking head application in SVG suitable for web sites or as an avatar. 🗣️
face3D_chung	Create 3D character avatar head objects with texture from a single photo for your games. 🎮
CrazyTalk	Exciting features for 3D head creation and automation. 🤪
tts avatar free download - SourceForge	Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
Verbatim AI - Product Information, Latest Updates, and Reviews 2023	A simple yet powerful API to generate AI "talking head" videos in near real-time with Verbatim AI. Add interest, intrigue, and dynamism to your chat bots! (🔧👄)
Best Open Source BASIC 3D Modeling Software	Includes talk3D_chung, a small example using obj models created with face3D_chung, and speak3D_chung_dll, a dll to load and display face3D_chung talking avatars. (🛠️🎭)
DVDStyler / Discussion / Help: ffmpeg-vbr or internal	Talking heads would get a bitrate which is unnecessarily high while using DVDStyler. (🛠️👄)
puffin web browser free download - SourceForge	Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
12 best AI video generators to use in 2023 Free and paid \|Product ...	Whether you’re an entrepreneur, small business owner, or run a large company, AI video generators make it super easy to create high-quality videos from scratch. (🔧🎥)

Slides & Presentations

Presentation Title	Description
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models	Presentation reviewing the few-shot adversarial learning of realistic neural talking head models.
Nethania Michelle's Character	PPT: Presentation discussing the improvement of a 3D talking head for use in an avatar of a virtual meeting room.
Presenting you: Top tips on presenting with Prezi Video – Prezi	Article providing top tips for presenting with Prezi Video.
Research Presentation	PPT: Resident Research Presentation Slide Deck.
Adding narration to your presentation (using Prezi Video) – Prezi	Learn how to add narration to your Prezi presentation with Prezi Video.

References

Website	Description
arXiv	Provides preprints in various academic fields, serving as an important platform for accessing the latest research findings.
CVF Open Access	The Computer Vision Foundation's open-access platform, offering open-access papers from top conferences such as CVPR, ICCV, ECCV, and more.
Papers with Code	A platform that aggregates research papers with accompanying code implementations, making it convenient to find the latest research findings and their corresponding implementations.
ICCV - International Conference on Computer Vision	The International Conference on Computer Vision, gathering the latest research findings in the field of computer vision.
ECCV - European Conference on Computer Vision	The European Conference on Computer Vision, providing the latest research results and related information in the field of computer vision.
CVPR - Conference on Computer Vision and Pattern Recognition	The Conference on Computer Vision and Pattern Recognition, one of the top conferences in computer vision, showcasing numerous important research findings.

yerfor/Awesome-Talking-Head-Synthesis