actuy

actuy's Stars

nomic-ai/gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Language:C++71.6k 647 2k7.8k
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python38.3k 223 1.4k4.3k
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python36.1k 348 2.9k4.2k
RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
Language:Python25.8k 181 1.7k3.7k
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python13.3k 125 7781.1k
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python13.2k 140 7431.4k
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python11k 168 8142.5k
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Language:Python10.1k 127 491950
OpenTalker/video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Language:Python6.8k 75 248995
microsoft/DeepSpeedExamples
Example models using DeepSpeed
Language:Python6.2k 75 5441.1k
pengsida/learning_research
本人的科研经验
6.2k 71 31368
rtqichen/torchdiffeq
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
Language:Python5.7k 127 219942
openai-translator/bob-plugin-openai-translator
基于 OpenAI API 的文本翻译、文本润色、语法纠错 Bob 插件，让我们一起迎接不需要巴别塔的新时代！Licensed under CC BY-NC-SA 4.0
Language:TypeScript5.6k 32 98256
Zejun-Yang/AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Language:Python4.8k 61 193591
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
Language:Python4.1k 40 395297
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Language:Jupyter Notebook3.8k 42 186321
richzhang/PerceptualSimilarity
LPIPS metric. pip install lpips
Language:Python3.7k 53 109502
yoyo-nb/Thin-Plate-Spline-Motion-Model
[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.
Language:Jupyter Notebook3.5k 63 91555
GuyTevet/motion-diffusion-model
The official PyTorch implementation of the paper "Human Motion Diffusion Model"
Language:Python3.2k 71 220353
IDEA-Research/DWPose
"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)
Language:Python2.3k 30 98146
jik876/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Language:Python2k 31 165512
snap-research/articulated-animation
Code for Motion Representations for Articulated Animation paper
Language:Jupyter Notebook1.2k 41 74354
Weizhi-Zhong/IP_LAP
CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
Language:Python712 19 5887
auspicious3000/contentvec
speech self-supervised representations
Language:Python475 11 3238
FlagAI-Open/Aquila2
The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.
Language:Python440 5 6830
maxrmorrison/torchcrepe
Pytorch implementation of the CREPE pitch tracker
Language:Python418 8 2863
facebookresearch/muavic
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Language:Python372 13 2332
revsic/torch-nansypp
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Language:Python145 28 411
CODEJIN/Glow_TTS
An implement of GlowTTS model. Several modes are added: speaker embedding, prosody encoder(GST), and gradient reversal.
Language:Python53 8 512
k-m-irfan/simplified_mediapipe_face_landmarks
Extracts essential Mediapipe face landmarks and arranges them in a sequenced order.
Language:Python27 1 12