/zhrtvc

chinese real time voice cloning

Primary LanguagePython

zhrtvc

Chinese Real Time Voice Cloning

目录介绍

zhrtvc

代码模块,包括模型训练,模型展示模块。

pretrained

预训练好的模型,包括encoder,synthesizer,vocoder模型。

article

相关文献。

sample

数据样本。

Real-Time Voice Cloning

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented yet (don't hesitate to make an issue for that too). Mostly I would recommend giving a quick look to the figures beyond the introduction.

SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Papers implemented

URL Designation Title Implementation source
1806.04558 SV2TTS Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis This repo
1802.08435 WaveRNN (vocoder) Efficient Neural Audio Synthesis fatchord/WaveRNN
1712.05884 Tacotron 2 (synthesizer) Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions Rayhane-mamah/Tacotron-2
1710.10467 GE2E (encoder) Generalized End-To-End Loss for Speaker Verification This repo