zyj008

zyj008's Stars

Neph0s/awesome-llm-role-playing-with-persona
Awesome-llm-role-playing-with-persona: a curated list of resources for large language models for role-playing with assigned personas
47623
Plachtaa/seed-vc
zero-shot voice conversion with in context learning
Language:Python13515
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python5.9k644
Anjok07/ultimatevocalremovergui
GUI for a Vocal Remover that uses Deep Neural Networks.
Language:Python17.5k1.3k
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python4.8k478
huggingface/dataspeech
Language:Python27035
Appen/UHV-OTS-Speech
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Language:Forth10019
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python4.4k379
huggingface/parler-tts
Inference and training library for high-quality TTS models.
Language:Python4.2k411
NATSpeech/NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
Language:Python96399
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Language:Python7.6k753
microsoft/DeepSpeedExamples
Example models using DeepSpeed
Language:Python6k1k
fishaudio/fish-diffusion
An easy to understand TTS / SVS / SVC framework
Language:Python62379
facebookresearch/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Language:Python1.6k301
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Language:Python10k856
PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language:Python10.9k1.8k
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Language:Python1.2k114
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Language:Python30.2k6.4k
richardbaihe/a3t
Code for paper A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Language:Python8510
w4123/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Language:Python47169
eriklindernoren/PyTorch-GAN
PyTorch implementations of Generative Adversarial Networks.
Language:Python16.2k4.1k
dunky11/voicesmith
[WIP] VoiceSmith makes training text to speech models easy.
Language:Python21732
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python33.4k4.1k
cnlinxi/book-text-to-speech
A book about Text-to-Speech (TTS) in Chinese.
Language:TeX57378
r9y9/pysptk
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Language:Python43679
jaakkopasanen/ABX
Web app for AB and ABX listening tests
Language:JavaScript501
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language:Python1k111
ranchlai/mandarin-tts
Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder, with biaobei and aishell3 datasets
Language:Python460109
Jackiexiao/MTTS
A Demo of Mandarin/Chinese TTS frontend
Language:Python275126
kan-bayashi/ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
Language:Jupyter Notebook1.5k339