bfs18's Stars
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
janhq/ichigo
Local realtime voice AI
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
pytorch/torchtitan
A native PyTorch Library for large model training
AnswerDotAI/gpu.cpp
A lightweight library for portable low-level GPU computation using WebGPU.
necludov/wl-mechanics
EurekaLabsAI/micrograd
The Autograd Engine
EurekaLabsAI/ngram
The n-gram Language Model
pixeli99/SVD_Xtend
Stable Video Diffusion Training Code and Extensions.
facebookresearch/Qinco
Residual Quantization with Implicit Neural Codebooks
UbiquitousLearning/mllm
Fast Multimodal LLM on Mobile Devices
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
microsoft/T-MAC
Low-bit LLM inference on CPU with lookup table
lucidrains/voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
test-time-training/ttt-lm-pytorch
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Delgan/loguru
Python logging made (stupidly) simple
DarkAutumn/triforce
A deep learning agent for The Legend of Zelda (nes)
winddori2002/DEX-TTS
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
KwaiVGI/LivePortrait
Bring portraits to life!
spyoungtech/FreeSimpleGUI
The free-forever GUI library
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
gcorso/disco-diffdock
Code for the paper DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents, ICML 2024
fishaudio/fish-speech
SOTA Open Source TTS
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
NUS-HPC-AI-Lab/VideoSys
VideoSys: An easy and efficient system for video generation
karpathy/LLM101n
LLM101n: Let's build a Storyteller
bojone/FSQ
Keras implement of Finite Scalar Quantization