PitchVC: Pitch Conditioned Any-to-Many Voice Conversion

🎧 Audio Samples. $\quad\quad$ 🤗 Play Online.

Description

A simple VC framework.

(a) Training	(b) Inference


(c) Training (w/ optional properties)	(d) Inference (w/ optional properties)

Detailed description.

Pre-requisites

Clone this repo: git clone https://github.com/OlaWod/PitchVC.git
CD into this repo: cd PitchVC
Install python requirements: pip install -r requirements.txt
Download files on demand (e.g. pretrained checkpoint) (download link)

Inference Example

Files on demand:

Pretrained checkpoint (e.g. exp/default/g_00700000)
Source wavs (e.g. src1.wav) and target wavs&embs (e.g. p244_008.wav&p244_008.npy) in convert.txt
Utils/JDC/bst.t7
(Optional) speakerlab/pretrained/speech_eres2net_sv_en_voxceleb_16k/pretrained_eres2net.ckpt and speakerlab/pretrained/speech_eres2net_sv_zh-cn_16k-common/pretrained_eres2net_aug.ckpt

# single process
CUDA_VISIBLE_DEVICES=0 python convert_sp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath convert.txt --outdir outputs/test

# single process; finetune input f0 automatically
CUDA_VISIBLE_DEVICES=0 python convert_sp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath convert.txt --outdir outputs/test --search

# multi process
CUDA_VISIBLE_DEVICES=0 python convert_mp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath convert.txt --outdir outputs/test --n_processes 6

# multi process; finetune input f0 automatically
CUDA_VISIBLE_DEVICES=0 python convert_mp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath convert.txt --outdir outputs/test --n_processes 6 --search

convert.txt:

{title}|{source_wav_path}|{target_spk_reference_wav_path}|{target_spk_id}|{target_spk_reference_embedding_path}
e.g.
title1|src1.wav|dataset/audio/p244/p244_008.wav|p244|dataset/spk/p244/p244_008.npy

Training Example

Files on demand:

VCTK dataset
speaker_encoder/ckpt/pretrained_bak_5805000.pt
Utils/JDC/bst.t7

Preprocess:

export PYTHONPATH=.

python preprocess/1_downsample.py --in_dir </path/to/VCTK/wavs> # dataset/vctk-16k/{spk}/{xx}.wav
python preprocess/2_get_flist.py    # filelists/{situation}.txt
python preprocess/3_get_spk2id.py   # filelists/spk2id.json
python preprocess/4_get_spk_emb.py  # dataset/spk/{spk}/{xx}.npy
python preprocess/5_get_spk_emb_best.py # filelists/spk_stats.json
python preprocess/6_get_f0.py       # dataset/f0/{spk}/{xx}.pt
python preprocess/7_get_f0_stats.py # filelists/f0_stats.json

cd dataset
ln -s vctk-16k audio
cd ..

Training:

CUDA_VISIBLE_DEVICES=0 python train.py --config config_v1_16k.json --checkpoint_path exp/test

Test Example

python test/1_select_tgt.py # test/TEST_TGT/{xx}.wav
python test/2_select_src.py # test/TEST_SRC_{CORPUS}/{xx}.wav
python test/3_get_txts.py   # test/txts/{scenario}.txt

CUDA_VISIBLE_DEVICES=0 python convert_mp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath test/txts/<scenario>.txt --outdir outputs/<scenario> --n_processes 6 --search

cd metrics/<metrics>
bash run.sh

OlaWod/PitchVC

PitchVC: Pitch Conditioned Any-to-Many Voice Conversion

Description

Pre-requisites

Inference Example

Training Example

Test Example

References