taichuai's Stars
hacksider/Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
1adrianb/face-alignment
:fire: 2D and 3D Face alignment library build using pytorch
google-deepmind/alphafold3
AlphaFold 3 inference pipeline.
mseitzer/pytorch-fid
Compute FID scores with PyTorch.
Tencent/Hunyuan3D-1
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
genmoai/models
The best OSS video generation models
deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
wenqsun/DimensionX
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
NVIDIA/Cosmos-Tokenizer
A suite of image and video neural tokenizers
edwko/OuteTTS
Interface for OuteTTS models.
mqz111a/virtual_human_stream
The "virtual_human_stream" project is a real-time digital human system supporting audio-video dialogue. It integrates models like ernerf, musetalk, and wav2lip for voice cloning, video stitching, and streaming via RTMP/WebRTC. It’s optimized for high performance and easy customization, with support for ChatGPT dialogue integration.
Henry-23/VideoChat
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cascaded solutions (ASR-LLM-TTS-THG). Customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.
EnVision-Research/Lotus
Official Implementation of LOTUS: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
adobe-research/MakeItTalk
OleehyO/TexTeller
TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability, enabling it to cover most usage scenarios.
xg-chu/GAGAvatar
[NeurIPS 2024] Generalizable and Animatable Gaussian Head Avatar
wdndev/tiny-llm-zh
从零实现一个小参数量中文大语言模型。
dongzhuoyao/awesome-flow-matching
A summary of related works about flow matching, stochastic interpolants
Femoon/tts-azure-web
TTS Azure Web 是一个 Azure 文本转语音(TTS)网页应用,可以在本地或者云端使用你的 Azure Key 一键部署。TTS Azure Web is an Azure Text-to-Speech (TTS) web application. It allows you to run it locally or deploy it with a single click using your Azure Key.
VideoVerses/VideoTuna
Let's finetune video generation models!
VITA-MLLM/Freeze-Omni
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
ddlBoJack/Awesome-Speech-Language-Model
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
shaopengw/Awesome-Music-Generation
Awesome music generation model——MG²
cantabile-kwok/vec2wav2.0
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
LeCAR-Lab/wococo
XingliangJin/MCM-LDM
[CVPR 2024] Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model
GhostCai/PortraitRelighting
Official PyTorch implementation of the CVPR 2024 Highlight Paper "Real-time 3D-aware Portrait Video Relighting"
audeering/w2v2-age-gender-how-to
How to use our public wav2vec2 age and gender model
liuhuang31/Megatts2_HierSpeechpp
Megatts2 use HierSpeechpp's vocoder
PeizhiYan/mediapipe-blendshapes-to-flame
Mapping Mediapipe's 52 blendshapes to FLAME's expression coefficients and poses.