keonlee9420's Stars
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
chenfei-wu/TaskMatrix
google-research/tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
mlfoundations/open_clip
An open source implementation of CLIP.
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
haoheliu/AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
archinetai/audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
microsoft/NeuralSpeech
LAION-AI/CLAP
Contrastive Language-Audio Pretraining
lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
LAION-AI/audio-dataset
Audio Dataset for training CLAP and other models
facebookresearch/AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
microsoft/CLAP
Learning audio concepts from natural language supervision
arpitbansal297/Universal-Guided-Diffusion
NVlabs/DiffiT
[ECCV 2024] Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation
liusongxiang/Large-Audio-Models
Keep track of big models in audio domain, including speech, singing, music etc.
MasayaKawamura/MB-iSTFT-VITS
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Rongjiehuang/GenerSpeech
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
NVIDIA/radtts
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
yangdongchao/SoundStorm
The reproduced code for Google's SoundStorm
chomeyama/SiFiGAN
Official implementation of the source-filter HiFiGAN vocoder
voidful/Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
keonlee9420/DailyTalk
Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023
tetrzim/diffusion-human-feedback
Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
krafton-ai/mini-batch-cl