Pinned Repositories
AutoVocoder
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models
diffusion-audio-restoration-nvidia-SR
Audio-to-Audio Schrodinger Bridges is a diffusion-based audio restoration model for bandwidth extension and inpainting.
F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
golf_diff_Glottal_Flow_LPC_synthesis
A DDSP-based neural vocoder.
MB-iSTFT-VITS2_super-monotonic-align
Application of MB-iSTFT-VITS components to vits2_pytorch
tacospawn
PyTorch implementation of TacoSpawn, Speaker Generation
unconditional-diff-STFT
Unconditional music synthesis using a diffusion model in the STFT domain
WaveletAttention
Wavelet-Attention CNNs for Image Classification
SynthAether's Repositories
SynthAether/emotion-annotations
SynthAether/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
SynthAether/spear-tts-pytorch
An unofficial PyTorch implementation of SPEAR-TTS.
SynthAether/swift-f0_pitch
Fast and accurate fundamental frequency (F0) detector using convolutional neural networks
SynthAether/Bert-VITS2
vits2 backbone with bert
SynthAether/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
SynthAether/aimet_quant
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
SynthAether/ARC-Encoder
SynthAether/BlaGPT
Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration.
SynthAether/CFrame_ZX
Simple C framework for the ZX Spectrum Next
SynthAether/Chatterbox-TTS-Extended
Modified version of Chatterbox that accepts text files as input and no character restrictions
SynthAether/ComfyUI-VibeVoice
ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio
SynthAether/CosyVoice_TTS
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
SynthAether/DiaMoE-TTS
Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"
SynthAether/FireRedTTS2
Long-form streaming TTS system for multi-speaker dialogue generation
SynthAether/fish-speech
Brand new TTS solution
SynthAether/flash-attention
SynthAether/index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
SynthAether/mair-hub
SynthAether/nanochat
The best ChatGPT that $100 can buy.
SynthAether/NeMo
NeMo: a toolkit for conversational AI
SynthAether/NeMoTTS
SynthAether/next-3D
A 3D library for the ZX Spectrum Next
SynthAether/ParaStyleTTS
This is the official code for ACM CIKM 2025 Paper: ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation
SynthAether/RWKV_TTS
This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).
SynthAether/SAC_semantic
Trainging, inference, and testing of the SAC speech codec model.
SynthAether/stylish-tts
High quality text-to-speech based on StyleTTS 2.
SynthAether/TTS-WebUI
A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark!
SynthAether/UTMOSv2
SynthAether/yt-dlp
A youtube-dl fork with additional features and fixes