Test-task for VK-research internship 2022 🕊️

Implementation of Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

1. Dataset

In this project we use Mozilla Common Voice Corpus 3.0 for russian language.

The dataset consists of 31 hours of recorded speech in .mp3 format with a sample rate 48000. To download the dataset, please, open Common Voice webpage with Russian Specch datasets. Select Common Voice Version 3.0, make sure the language is Russian. After that enter your email, click right mouse bottun and copy url adress. Then run download.sh script from the root of cloned repository with an argumet (copied url adress). The following script will create a directory in the root of repository with .wav files imported and converted with the same sample rate from the original dataset.

The are 2 reasons for conducting audio-format conversion:

hifi-gan implementation is used as a beseline model for the experiment, and the authors use .wav format of audio in theirs implementation;
at least with librosa loadig .wav files is a bit faster than loading mp3 files.

After running download.sh script folders should be placed as following:

  Fre-GAN
      |- data
          |- audio
          | __init__.py
          | train.tsv
          | test.tsv

2. Training setup

Step 1: Adjust training parameters in config.yaml

Step 2: To train model in Docker please, run from the root of this repository:

  docker build --network=host -t fre-gan:train .

Step 3: After build is complit, to run using GPU:

  docker run --gpus 1 -ti fre-gan:train

For CPU-only:

  docker run -ti fre-gan:train

Step 4: From the repository root run:

  python3 -m src.train

If you are not using Docker just skip steps 2 & 3 :)

3. Evaluation setup

Step 1 Download model wigths: Dummy Weights

Step 2: from the repository root run:

  python3 -m src.inference -w <model_weights_path> -p <reference_wav_path>

The following script will generate output wav-file in data/generated_samples directory. You can add a custom path with -o flag and obeserve other flags in inference.py(src/inference.py) script

To run evaluation in Docker complete Step3 from training block before running the script.

4. Baseline

To train HiFi-GAN vocoder on Common Voice Version 3.0 dataset:

follow the instructions to dataset loading in the first block;
clone forked HiFi-GAN repository;
run train.py from common-voice-branch.

Instructions for evaluation are the same as in original repository. Dummy Weights for inference.

dariadiatlova/Fre-GAN