Pinned Repositories
AEC_DeepModel
基于深度学习的声学回声消除基线代码
AI_beatmap_generator
尝试使用神经网络生成音乐游戏Malody的谱面。
Audio-to-midi
An application of vocal melody extraction.
AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
autovc
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
AutoVC-WavRNN
voice conversion system
Avocodo
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
BeatNet
This repository contains the implementation of the AI-based "BeatNet" Joint beat, downbeat, tempo, and meter tracking system using CRNN and particle filtering. 2021's state-of-the-art online model - (ISMIR 2021).
Bert-VITS2
vits2 backbone with multilingual-bert
Cross-Lingual-Voice-Cloning
Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning.
markyouyuren's Repositories
markyouyuren/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
markyouyuren/Avocodo
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
markyouyuren/Bert-VITS2
vits2 backbone with multilingual-bert
markyouyuren/DiffGAN-TTS
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
markyouyuren/DiffSinger-1
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
markyouyuren/e2e_dnn_ad_control_for_lin_aec
End-To-End Deep Learning-based Adaptation Control for Linear Acoustic Echo Cancellation
markyouyuren/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
markyouyuren/GenerSpeech
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
markyouyuren/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
markyouyuren/Languagecodec
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
markyouyuren/Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬
markyouyuren/MixGAN-TTS
MixGAN-TTS: End-to-End Speech Synthesis Based on Diffusion Model
markyouyuren/MSMC-TTS
Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS
markyouyuren/NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
markyouyuren/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
markyouyuren/NeuralSVB
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code
markyouyuren/SadTalker-Video-Lip-Sync
本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形,设置面部区域可配置的增强方式进行合成唇形(人脸)区域画面增强,提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧,补充帧间合成唇形的动作过渡,使合成的唇形更为流畅、真实以及自然。
markyouyuren/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
markyouyuren/speechgpt
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
markyouyuren/TransformerTTS
🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
markyouyuren/ttsmms
TTS with The Massively Multilingual Speech (MMS) project
markyouyuren/ultimatevocalremovergui
GUI for a Vocal Remover that uses Deep Neural Networks.
markyouyuren/VAEJETS
Conditional Variational Auto-Encoder with Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
markyouyuren/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
markyouyuren/VI-SVS
Use VITS and Opencpop to develop singing voice synthesis; Different from VISinger.
markyouyuren/vispeech
基于vits fastspeech2 visinger的tts模型
markyouyuren/VITS-fast-fine-tuning
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
markyouyuren/vits-with-pith
vits
markyouyuren/VITSinger
Singing Voice Speech modeling test
markyouyuren/whisperer
Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.