trailblazer-bit

trailblazer-bit's Stars

facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python20.7k 205 3772.1k
fishaudio/fish-speech
Brand new TTS solution
Language:Python12.9k 91 366961
state-spaces/mamba
Mamba SSM architecture
Language:Python12.7k 101 5131.1k
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python11.7k 135 6931.2k
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python5.2k 52 397538
crowsonkb/k-diffusion
Karras et al. (2022) diffusion models for PyTorch
Language:Python2.3k 42 65374
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
Language:Python2k 31 8486
ming024/FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
Language:Python1.8k 28 213530
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Language:Python1.1k 27 75106
marl/crepe
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
Language:Python1.1k 34 78158
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
Language:Python854 71 097
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Language:Python826 17 5241
csteinmetz1/auraloss
Collection of audio-focused loss functions in PyTorch
Language:Python723 17 3566
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language:Python686 18 3539
facebookresearch/speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
Language:Python383 19 1953
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Language:Python347 23 2039
v-iashin/SpecVQGAN
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Language:Jupyter Notebook341 7 3538
metame-ai/awesome-audio-plaza
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
315 30 211
LSimon95/megatts2
Unoffical implementation of Megatts2
Language:Python260 22 2035
fishaudio/audio-preprocess
Preprocess Audio for training
Language:Python233 8 945
descriptinc/audiotools
Object-oriented handling of audio data, with GPU-powered augmentations, and more.
Language:Python222 28 1837
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
Language:Python214 22 311
hayeong0/DDDM-VC
Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
Language:Python179 15 1719
Plachtaa/FAcodec
Training code for FAcodec presented in NaturalSpeech3
Language:Python162 9 2418
jishengpeng/TextrolSpeech
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
Language:Python122 6 15
yukara-ikemiya/friendly-stable-audio-tools
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
Language:Python116 3 410
CNChTu/FCPE
Language:Python95 5 618
Text-to-Audio/Make-An-Audio-3
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
Language:Python69 5 33
scutcsq/Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
Language:Python58 8 24
WangHelin1997/Speech-paper-crawl
My Python scripts for crawling paper related on speech processing.
Language:Python5 2 0

trailblazer-bit

trailblazer-bit's Stars

facebookresearch/audiocraft

fishaudio/fish-speech

state-spaces/mamba

m-bain/whisperX

FunAudioLLM/CosyVoice

crowsonkb/k-diffusion

Alpha-VLLM/Lumina-T2X

ming024/FastSpeech2

descriptinc/descript-audio-codec

marl/crepe

NVIDIA/BigVGAN

LTH14/mar

csteinmetz1/auraloss

jishengpeng/WavTokenizer

facebookresearch/speech-resynthesis

KdaiP/StableTTS

v-iashin/SpecVQGAN

metame-ai/awesome-audio-plaza

LSimon95/megatts2

fishaudio/audio-preprocess

descriptinc/audiotools

OpenT2S/LlamaVoice

hayeong0/DDDM-VC

Plachtaa/FAcodec

jishengpeng/TextrolSpeech

yukara-ikemiya/friendly-stable-audio-tools

CNChTu/FCPE

Text-to-Audio/Make-An-Audio-3

scutcsq/Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction

WangHelin1997/Speech-paper-crawl