Pinned Repositories
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
athena
an open-source implementation of sequence-to-sequence based speech processing engine
bark
🔊 Text-Prompted Generative Audio Model
Bert-VITS2
vits2 backbone with bert
DenoiseNet
An implementation of DenoiseNet https://arxiv.org/pdf/1701.01687.pdf
DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
FaceSwap
3D face swapping implemented in Python
ForwardTacotron
⏩ Generating speech in a single forward pass without any attention!
GPT-SoVITS
1 mins voice data can also be used to train a good TTS model!
PaddleSpeech
Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
nangongmujd's Repositories
nangongmujd/GPT-SoVITS
1 mins voice data can also be used to train a good TTS model!
nangongmujd/PaddleSpeech
Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
nangongmujd/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
nangongmujd/bark
🔊 Text-Prompted Generative Audio Model
nangongmujd/Bert-VITS2
vits2 backbone with bert
nangongmujd/DenoiseNet
An implementation of DenoiseNet https://arxiv.org/pdf/1701.01687.pdf
nangongmujd/DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
nangongmujd/FaceSwap
3D face swapping implemented in Python
nangongmujd/ForwardTacotron
⏩ Generating speech in a single forward pass without any attention!
nangongmujd/FullSubNet
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
nangongmujd/GenerSpeech
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
nangongmujd/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
nangongmujd/iSTFTNet-pytorch
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
nangongmujd/OpenTransformer
A No-Recurrence Sequence-to-Sequence Model for Speech Recognition
nangongmujd/megatts2
Unoffical implementation of Megatts2
nangongmujd/MiniGPT-4
MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
nangongmujd/NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
nangongmujd/polyglot
Multilingual text (NLP) processing toolkit
nangongmujd/so-vits-svc
SoftVC VITS Singing Voice Conversion
nangongmujd/sru
Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)
nangongmujd/StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
nangongmujd/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
nangongmujd/SyntaSpeech
SyntaSpeech: Syntax-aware Generative Adversarial Text-to-Speech; IJCAI 2022; Official code
nangongmujd/TransformerTTS
🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
nangongmujd/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
nangongmujd/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E, WIP
nangongmujd/VITS-fast-fine-tuning
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
nangongmujd/vits2_pytorch
unofficial vits2-TTS implementation in pytorch
nangongmujd/VocGAN
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
nangongmujd/voxelmorph
Unsupervised Learning for Image Registration