JuneoXIE's Stars
TencentARC/GFPGAN
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
HumanAIGC/AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
OpenTalker/SadTalker
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
TencentARC/InstantMesh
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Zz-ww/SadTalker-Video-Lip-Sync
本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形,设置面部区域可配置的增强方式进行合成唇形(人脸)区域画面增强,提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧,补充帧间合成唇形的动作过渡,使合成的唇形更为流畅、真实以及自然。
facebookresearch/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
ali-vilab/dreamtalk
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
ramprs/grad-cam
[ICCV 2017] Torch code for Grad-CAM
Fictionarry/ER-NeRF
[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
lizhe00/AnimatableGaussians
Code of [CVPR 2024] "Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling"
YuelangX/Gaussian-Head-Avatar
[CVPR 2024] Official repository for "Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians"
HumanAIGC/VividTalk
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
xiezhy6/GP-VTON
Official Implementation for CVPR2023 paper "GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning"
aipixel/GaussianAvatar
[CVPR 2024] The official repo for "GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians"
chiehwangs/gaussian-head
Official repository for 'GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation'
Trinkle23897/Fast-Poisson-Image-Editing
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
ControlNet/MARLIN
[CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg
YanzuoLu/CFLD
[CVPR 2024 Highlight] Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
mindspore-lab/minddiffusion
A collection of diffusion models based on MindSpore
mbzuai-metaverse/VOODOO3D-official
Official implementation for the paper "VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment"
kenwaytis/faster-SadTalker-API
The API server version of the SadTalker project. Runs in Docker, 10 times faster than the original!
yufan1012/MonoGaussianAvatar
yunik1004/SAiD
SAiD: Blendshape-based Audio-Driven Speech Animation with Diffusion
AaronComo/LipFD
[NeurIPS 2024] This is the official repo of the paper "Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-syncing DeepFakes".
chuangchuangtan/FreqNet-DeepfakeDetection
bchao1/fast-poisson-image-editing
Fast, scalable, and extensive implementations of Poisson image editing algorithms.
aiden200/2D3MF
Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"
raining-dev/AVT2-DWF
AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies
AmirSh15/VAED_HeterGraph
The implementation for Interspeech22 "Visually-aware Acoustic Event Detection using Heterogeneous Graphs" paper