LiuMingYy's Stars
CyC2018/CS-Notes
:books: 技术面试必备基础知识、Leetcode、计算机操作系统、计算机网络、系统设计
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
NVIDIA/tacotron2
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Plachtaa/VITS-fast-fine-tuning
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
PlayVoice/whisper-vits-svc
Core Engine of Singing Voice Conversion & Singing Voice Clone
ming024/FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
bootphon/phonemizer
Simple text to phones converter for multiple languages
auspicious3000/autovc
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
wenet-e2e/speech-synthesis-paper
List of speech synthesis papers.
yeyupiaoling/MASR
Pytorch实现的流式与非流式的自动语音识别框架,同时兼容在线和离线识别,目前支持Conformer、Squeezeformer、DeepSpeech2模型,支持多种数据增强方法。
OlaWod/FreeVC
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
p0p4k/vits2_pytorch
unofficial vits2-TTS implementation in pytorch
heatz123/naturalspeech
A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)
modelscope/FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
wesbz/SoundStream
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
wenet-e2e/speech-recognition-papers
Towards hot directions in industrial end to end speech recognition
kaituoxu/Listen-Attend-Spell
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
glory20h/VoiceLDM
VoiceLDM: Text-to-Speech with Environmental Context
ConsistencyVC/ConsistencyVC-voive-conversion
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
thu-ml/Bridge-TTS
Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).
XuelianCheng/SLT-Net
Implicit Motion Handling for Video Camouflaged Object Detection (CVPR 2022)
ChunmingHe/WS-SAM
double22a/asr_nlp_paper_code
Papers of ASR, Tools of ASR
Ash-one/ch_vits
语音合成端到端TTS模型vits中文版,VITS Mandarin
cnlinxi/blog
personal blog
qwen-audio/Qwen-Audio