actuy's Stars
nomic-ai/gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
OpenTalker/video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
microsoft/DeepSpeedExamples
Example models using DeepSpeed
pengsida/learning_research
本人的科研经验
rtqichen/torchdiffeq
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
openai-translator/bob-plugin-openai-translator
基于 OpenAI API 的文本翻译、文本润色、语法纠错 Bob 插件,让我们一起迎接不需要巴别塔的新时代!Licensed under CC BY-NC-SA 4.0
Zejun-Yang/AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
richzhang/PerceptualSimilarity
LPIPS metric. pip install lpips
yoyo-nb/Thin-Plate-Spline-Motion-Model
[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.
GuyTevet/motion-diffusion-model
The official PyTorch implementation of the paper "Human Motion Diffusion Model"
IDEA-Research/DWPose
"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)
jik876/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
snap-research/articulated-animation
Code for Motion Representations for Articulated Animation paper
Weizhi-Zhong/IP_LAP
CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
auspicious3000/contentvec
speech self-supervised representations
FlagAI-Open/Aquila2
The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.
maxrmorrison/torchcrepe
Pytorch implementation of the CREPE pitch tracker
facebookresearch/muavic
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
revsic/torch-nansypp
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
CODEJIN/Glow_TTS
An implement of GlowTTS model. Several modes are added: speaker embedding, prosody encoder(GST), and gradient reversal.
k-m-irfan/simplified_mediapipe_face_landmarks
Extracts essential Mediapipe face landmarks and arranges them in a sequenced order.