zhrtvc

Zhongwen Real Time Voice Cloning

版本

v1.1.3

详见readme

原始语音和克隆语音对比样例

链接: https://pan.baidu.com/s/1TQwgzEIxD2VBrVZKCblN1g

提取码: 8ucd

变更
- 从aukit.audio_io模块导入Dict2Obj。
- toolbox可视化显示合成的embed，alignment，spectrogram。
- toolbox录音修正格式不一致的bug。
- 增加代码行工具demo_cli。
- toolbox增加Preprocess的语音预处理按键，降噪和去除静音。
- 修正toolbox合成语音结尾截断的bug。
- 样例文本提供长句和短句。
- 增加合成参考音频文本的按键Compare，对比参考语音和合成语音。
toolbox

合成样例

aliaudio-Aibao-004113.wav

aliaudio-Aimei-007261.wav

aliaudio-Aina-000819.wav

aliaudio-Aiqi-009619.wav

aliaudio-Aitong-003149.wav

aliaudio-Aiwei-009461.wav

注意

跑提供的模型建议用Griffin-Lim声码器，目前MelGAN和WaveRNN没有完全适配。

目录介绍

zhrtvc

代码，包括encoder、synthesizer、vocoder、toolbox模块，包括模型训练的模块和可视化合成语音的模块。

执行脚本需要进入zhrtvc目录操作。

代码相关的说明详见zhrtvc目录下的readme文件。

models

预训练的模型，包括encoder、synthesizer、vocoder的模型。

预训练的模型在百度网盘下载，下载后解压，替换models文件夹即可。

样本模型

链接：https://pan.baidu.com/s/14hmJW7sY5PYYcCFAbqV0Kw

提取码：zl9i

data

语料样例，包括语音和文本对齐语料，处理好的用于训练synthesizer的数据样例。

可以直接执行synthesizer_preprocess_audio.py和synthesizer_preprocess_embeds.py把samples的语音文本对齐语料转为SV2TTS的用于训练synthesizer的数据。

语料样例在百度网盘下载，下载后解压，替换data文件夹即可。

样本数据

链接：https://pan.baidu.com/s/1Q_WUrmb7MW_6zQSPqhX9Vw

提取码：bivr

Real-Time Voice Cloning

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented yet (don't hesitate to make an issue for that too). Mostly I would recommend giving a quick look to the figures beyond the introduction.

SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Papers implemented

URL	Designation	Title	Implementation source
1806.04558	SV2TTS	Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	This repo
1802.08435	WaveRNN (vocoder)	Efficient Neural Audio Synthesis	fatchord/WaveRNN
1712.05884	Tacotron 2 (synthesizer)	Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions	Rayhane-mamah/Tacotron-2
1710.10467	GE2E (encoder)	Generalized End-To-End Loss for Speaker Verification	This repo

zpg1995/zhrtvc

zhrtvc

版本

目录介绍

zhrtvc

models

data

Real-Time Voice Cloning

Papers implemented