/CycleGAN-VC2

Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2

Primary LanguagePythonMIT LicenseMIT

CycleGAN-VC2-PyTorch

standard-readme compliant Donate

中文说明 | English


This code is a PyTorch implementation for paper: CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion, a nice work on Voice-Conversion/Voice Cloning.


Update

2020.11.17: fixed issues: re-implements the second step adverserial loss.

2020.08.27: add the second step adverserial loss by Jeffery-zhang-nfls

CycleGAN-VC2

To advance the research on non-parallel VC, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (Patch GAN).

network


This repository contains:

  1. model code which implemented the paper.
  2. audio preprocessing script you can use to create cache for training data.
  3. training scripts to train the model.
  4. Examples of Voice Conversion - converted result after training.

Table of Contents


Requirement

pip install -r requirements.txt

Usage

preprocess

python preprocess_training.py

is short for

python preprocess_training.py --train_A_dir ./data/S0913/ --train_B_dir ./data/gaoxiaosong/ --cache_folder ./cache/

train

python train.py

is short for

python train.py --logf0s_normalization ./cache/logf0s_normalization.npz --mcep_normalization ./cache/mcep_normalization.npz --coded_sps_A_norm ./cache/coded_sps_A_norm.pickle --coded_sps_B_norm ./cache/coded_sps_B_norm.pickle --model_checkpoint ./model_checkpoint/ --resume_training_at ./model_checkpoint/_CycleGAN_CheckPoint --validation_A_dir ./data/S0913/ --output_A_dir ./converted_sound/S0913 --validation_B_dir ./data/gaoxiaosong/ --output_B_dir ./converted_sound/gaoxiaosong/

Pretrained

a pretrained model which converted between S0913 and GaoXiaoSong

download from Google Drive <735MB>


Demo

Samples:

reference speaker A: S0913(./data/S0913/BAC009S0913W0351.wav)

reference speaker B: GaoXiaoSong(./data/gaoxiaosong/gaoxiaosong_1.wav)

speaker A's speech changes to speaker B's voice: Converted from S0913 to GaoXiaoSong (./converted_sound/S0913/BAC009S0913W0351.wav)


Reference

  1. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Paper, Project
  2. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. Paper, Project
  3. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Paper, Project, Code
  4. Image-to-Image Translation with Conditional Adversarial Nets. Paper, Project, Code

Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

ali_pay

WechatPay(微信)

wechat_pay

paypal


License

MIT © Kun