fullstackpeng's Stars
microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
TadasBaltrusaitis/OpenFace
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
Fictionarry/TalkingGaussian
[ECCV'24] TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
ZiqiaoPeng/SyncTalk
[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
ossrs/srs
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
AlistGo/alist
🗂️A file list/WebDAV program that supports multiple storages, powered by Gin and Solidjs. / 一个支持多存储的文件列表/WebDAV程序,使用 Gin 和 Solidjs。
zzj1111/Preprocessed-CMLR-Dataset-For-Wav2Lip
Considering the original Wav2Lip was trained on LSR2 and didn't have good performance on Chinese. I preprocessed CMLR Dataset and would train Wav2Lip on CMLR. Wish it would do better in Chinese.
Aruen24/wav2lip_288x288_test
mesolitica/malaya-speech
Speech Toolkit for Malaysian language, https://malaya-speech.readthedocs.io/
bmild/nerf
Code release for NeRF (Neural Radiance Fields)
facefusion/facefusion
Industry leading face manipulation platform
iperov/DeepFaceLab
DeepFaceLab is the leading software for creating deepfakes.
xinntao/Real-ESRGAN
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
TMElyralab/MuseTalk
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
myshell-ai/OpenVoice
Instant voice cloning by MIT and MyShell.
XPixelGroup/BasicSR
Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.
TencentARC/GFPGAN
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
jianchang512/pyvideotrans
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
jaywalnut310/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
idiap/coqui-ai-TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
wooorm/franc
Natural language detection
fishaudio/fish-speech
SOTA Open Source TTS
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
jeessy2/ddns-go
Simple and easy to use DDNS. Support Aliyun, Tencent Cloud, Dnspod, Cloudflare, Callback, Huawei Cloud, Baidu Cloud, Porkbun, GoDaddy, Namecheap, NameSilo...
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.