Pinned Repositories
AutoVocoder
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models
diffusion-audio-restoration-nvidia-SR
Audio-to-Audio Schrodinger Bridges is a diffusion-based audio restoration model for bandwidth extension and inpainting.
F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
golf_diff_Glottal_Flow_LPC_synthesis
A DDSP-based neural vocoder.
MB-iSTFT-VITS2_super-monotonic-align
Application of MB-iSTFT-VITS components to vits2_pytorch
tacospawn
PyTorch implementation of TacoSpawn, Speaker Generation
unconditional-diff-STFT
Unconditional music synthesis using a diffusion model in the STFT domain
WaveletAttention
Wavelet-Attention CNNs for Image Classification
SynthAether's Repositories
SynthAether/snac_Multi-Scale-Neural-Audio-Codec
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
SynthAether/Bert-VITS2
vits2 backbone with bert
SynthAether/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
SynthAether/AAREfficient-Autoregressive-Audio-Modeling-via-Next-Scale-Prediction
[Official Implementation] Acoustic Autoregressive Modeling 🔥
SynthAether/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
SynthAether/f5-tts-mlx
Implementation of F5-TTS in MLX
SynthAether/fish-speech
Brand new TTS solution
SynthAether/GeneFace
Official Pytorch Implementation of GeneFace (ICLR 2023)
SynthAether/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
SynthAether/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
SynthAether/hertz-dev
first base model for full-duplex conversational audio
SynthAether/highway_SIMD
Performance-portable, length-agnostic SIMD with runtime dispatch
SynthAether/lpc_vocoder
Vocoder LPC for speech signals
SynthAether/MagVITS
VITS with phoneme-level prosody modeling based on MaskGIT
SynthAether/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
SynthAether/moshi
SynthAether/OuteTTS
SynthAether/piper_larynx2_vits_TTS_cpp_onnx
A fast, local neural text to speech system
SynthAether/PyTorch-Wavelet-Toolbox
Differentiable fast wavelet transforms in PyTorch with GPU support.
SynthAether/rfwave_vocoder
SynthAether/rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
SynthAether/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity and Number Detector
SynthAether/simple-tts
(WIP)
SynthAether/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
SynthAether/sttatts
SynthAether/super-monotonic-align_MAS
SynthAether/tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
SynthAether/ultravox
SynthAether/wavefit-pytorch
PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.
SynthAether/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling