Pinned Repositories
a3t
Code for paper A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
abylouw.github.io
alignflow
APNet2
Source code of APNet2, a vocoder
audioseal
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
audiowmark
Audio Watermarking
autovc
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Best-README-Template
An awesome README template to jumpstart your projects!
Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
World
A high-quality speech analysis, manipulation and synthesis system
abylouw's Repositories
abylouw/APNet2
Source code of APNet2, a vocoder
abylouw/audioseal
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
abylouw/audiowmark
Audio Watermarking
abylouw/Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
abylouw/ConsistencyVC-voive-conversion
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
abylouw/convnext_tts
Unofficial implementation of ConvNeXt-TTS powered by lightning and Rye
abylouw/dectalk
Modern builds for the 90s/00s DECtalk text-to-speech application.
abylouw/descript-audio-vae
VAE GAN modified from Descript Audio Codec, which replaces the RVQ with VAE
abylouw/DiscreteSpeechMetrics
Reference-aware automatic speech evaluation toolkit
abylouw/istft-onnx
Export an ONNX graph that performs ISTFT. Designed for TTS models.
abylouw/LipSick
🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮
abylouw/Matcha-TTS
🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
abylouw/MB-iSTFT-VITS2
Application of MB-iSTFT-VITS components to vits2_pytorch
abylouw/Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
abylouw/onnx-simplifier
Simplify your onnx model
abylouw/pflow-encodec
Implementation of TTS model based on NVIDIA P-Flow TTS Paper
abylouw/pflowtts_pytorch
Unofficial implementation of NVIDIA P-Flow TTS paper
abylouw/Real3DPortrait
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code
abylouw/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
abylouw/snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
abylouw/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
abylouw/TiCodec
abylouw/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
abylouw/UniCATS-CTX-vec2wav
Code for CTX-vec2wav in UniCATS
abylouw/VoiceFlow-TTS
This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
abylouw/wavenext_pytorch
Unofficial implementation of wavenext vocoder
abylouw/wavmark
AI-based Audio Watermarking Tool
abylouw/X-E-Speech-code
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
abylouw/ZEST
Zero-Shot Emotion Style Transfer
abylouw/ZMM-TTS
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations