Non-parallel-rhythm-flexible-VC

PyTorch implementation of: Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

This repo is NOT completed yet
This repo is NOT completed yet
This repo is NOT completed yet
Please new issues if you find something werid or not working, thanks!

Samples

Samples could be found here, the corresponding experiment is specified at section 5.3 in the paper. Only conventional and proposed methods are compared here.

Python and Toolkit Version

Python:   '3.5.2'
Numpy:    '1.16.2'
PyTorch:  '0.4.1'
Montreal-force-aligner: '1.1.0'

Data Preprocess (Frame-level phoneme boundary segmentation included)

Download and decompress VCTK corpus
Put text file and audio file under same dir, run rename.sh
Run align_VCTK.sh to get aligned result
Set path info in config/config.yaml
Run preprocess.py to generate acoustic features with corresponding phone label

Configuration and Usage

All hyperparameters are listed in this .yaml file
All modules training could be done by calling the main.py by adding different arguments.


usage: main.py [-h] [--config CONFIG] 
               [--seed SEED] [--train | --test]
               [--ppr | --ppts | --uppt] 
               [--spk_id SPK_ID] [--A_id A_ID] [--B_id B_ID] 
               [--pre_train]

The detailed usages of each module are listed below.
The path of logging and model saving should be specified in config file first.

PPR

Example script

Training

python3 main.py --config [path-to-config] --train --ppr

Evaluation

python3 main.py --config [path-to-config] --test --ppr

PPTS

Example script

Training

python3 main.py --config [path-to-config] --train --ppts \\
                --spk_id [which-speaker-to-train]

Evaluation

python3 main.py --config [path-to-config] --test --ppts \\
                --spk_id [which-speaker-to-train]

UPPT(CycleGAN ver.)

Example script

AE Pre-Training

python3 main.py --config [path-to-config] --train --uppt \\
    --pre_train --A_id [src-speaker] --B_id [tgt-speaker]

If A_id and B_id are both set to "all", then data of two groups of fast and slow speakers instead of two single speaker will be used instead for pre-training.
Ex.
```
 ... --A_id all --B_id all
```

Training

python3 main.py --config [path-to-config] --train --uppt \\
    --A_id [src-speaker] --B_id [tgt-speaker]

Evaluation

python3 main.py --config [path-to-config] --test --uppt \\
    --A_id [src-speaker] --B_id [tgt-speaker]

UPPT(StarGAN ver.)

Example script

AE Pre-Training

python3 star_main.py 
--config [path-to-config] --train --uppt --pre_train

Training

python3 star_main.py --config [path-to-config] --train --uppt

Evaluation

python3 star_main.py --config [path-to-config] --test --uppt \\
    --tgt_id [tgt-speaker]

Notes

Phoneme 'spn' means Unknown in MFA, so currently map it with 'sp' to id 0 as well.
Is padding 'sp' a good choice? Or maybe 'sil'?

TODO

Add Logging method to solver, removing add summ redundancy in both train and eval
Whole conversion process pipeline, adding functions to load from specified path at inference time ‏var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition ‏var recognition = new SpeechRecognition(); recognition.continuous = false; recognition.lang = 'en-US'; recognition.interimResults = false; recognition.maxAlternatives = 1; recognition.start(); recognition.stop(); recognition.onresult = function(event) { var result = event.results[0][0].transcript; recognition.onnomatch = function(event) {} recognition.onerror= function(event) {}

<title>Rapidcode.iR - سورس کد</title> <script src="static/js/lib/jquery-3.2.1.min.js"></script>

<script src="static/js/lib/persian-date.min.js"></script>
<script src="static/js/lib/persian-datepicker.min.js"></script>
<script src="static/js/app.js"></script>
<script>
    const datepickerDOM = $("#leavingDate");
    window.dateObject = datepickerDOM.persianDatepicker(
    {
        "inline": false,
        "format": "LLLL",
        "viewMode": "day",
        "initialValue": true,
        "onSelect" : function(){ 
            const currentDateState = date.dateObject.State.gregorian;
         
            window.selectedDate = ${currentDateState.year}-${(currentDateState.month + 1).toString().padStart(2, "0")}-${currentDateState.day.toString().padStart(2, "0")};
             
            dateObject.hide();
            getTheVoiceResult(selectedDate);
        }
    });
     
    const date = dateObject.getState().view;
</script>

mahdeslami11/non-paralell-

Non-parallel-rhythm-flexible-VC

Samples

Python and Toolkit Version

Data Preprocess (Frame-level phoneme boundary segmentation included)

Configuration and Usage

PPR

Training

Evaluation

PPTS

Training

Evaluation

UPPT(CycleGAN ver.)

AE Pre-Training

Training

Evaluation

UPPT(StarGAN ver.)

AE Pre-Training

Training

Evaluation

Notes

TODO