Implementation of Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
In this project we use Mozilla Common Voice Corpus 3.0 for russian language.
The dataset consists of 31 hours of recorded speech in .mp3
format with a sample rate 48000
. To download the dataset, please, open Common Voice webpage with Russian Specch datasets. Select Common Voice Version 3.0
, make sure the language is Russian
. After that enter your email, click right mouse bottun
and copy url adress
. Then run download.sh
script from the root of cloned repository with an argumet (copied url adress
). The following script will create a directory in the root of repository with .wav
files imported and converted with the same sample rate from the original dataset.
The are 2 reasons for conducting audio-format conversion:
hifi-gan
implementation is used as a beseline model for the experiment, and the authors use.wav
format of audio in theirs implementation;- at least with librosa loadig
.wav
files is a bit faster than loadingmp3
files.
After running download.sh
script folders should be placed as following:
Fre-GAN
|- data
|- audio
| __init__.py
| train.tsv
| test.tsv
Step 1: Adjust training parameters in config.yaml
Step 2: To train model in Docker
please, run from the root of this repository:
docker build --network=host -t fre-gan:train .
Step 3: After build is complit, to run using GPU
:
docker run --gpus 1 -ti fre-gan:train
For CPU
-only:
docker run -ti fre-gan:train
Step 4: From the repository root run:
python3 -m src.train
If you are not using Docker just skip steps 2 & 3 :)
Step 1 Download model wigths: Dummy Weights
Step 2: from the repository root run:
python3 -m src.inference -w <model_weights_path> -p <reference_wav_path>
The following script will generate output wav-file in data/generated_samples
directory. You can add a custom path with -o
flag and
obeserve other flags in inference.py
(src/inference.py) script
To run evaluation in Docker complete Step3 from training block before running the script.
To train HiFi-GAN vocoder on Common Voice Version 3.0
dataset:
-
follow the instructions to dataset loading in the first block;
-
clone forked HiFi-GAN repository;
-
run
train.py
from common-voice-branch.
Instructions for evaluation are the same as in original repository. Dummy Weights for inference.