VocBench: A Neural Vocoder Benchmark for Speech Synthesis

PyTorch implementation for VocBench framework.

Installation

Python >= 3.6
Get VocBench code

$ git clone https://github.com/facebookresearch/vocoder-benchmark.git
$ cd vocoder-benchmark

Install dependencies

$ python3 -m venv vocbench
# activate the virtualenv
$ source vocbench/bin/activate
# Upgrade pip
$ python -m pip install --upgrade pip
# Install dependences
$ pip install -e .

To use VocBench cli, make sure to set paths in your .bashrc or .bash_profile appropriately.

VOCODER_BENCHMARK=/path/to/vocoder-benchmark
export PATH=$VOCODER_BENCHMARK/bin:$PATH

Change the binary file permission and test your installation

$ chomd +x $VOCODER_BENCHMARK/bin/vocoder
$ vocoder --help
Usage: cli.py [OPTIONS] COMMAND [ARGS]...

  Vocoder benchmarking CLI.

Options:
  --help  Show this message and exit.

Commands:
  dataset           Dataset processing.
  diffwave          Create, train, or use diffwave models.
  parallel_wavegan  Create, train, or use parallel_wavegan models.
  wavegrad          Create, train, or use wavegrad models.
  wavenet           Create, train, or use wavenet models.
  wavernn           Create, train, or use wavernn models.

Usage

Download dataset

$ vocoder dataset --help # For more information on how to download/split dataset

# e.g. download and split LJ Speech
$ vocoder dataset download --dataset ljspeech --path ~/local/datasets/lj # Download and unzip dataset files
$ vocoder dataset split --dataset ljspeech --path ~/local/datasets/lj  # Create train / validation / test splits

Training

$ vocoder [model-cmd] train --help

# e.g. train wavenet on LJ Speech dataset
$ vocoder wavenet train --path ~/local/models/wavenet --dataset ~/local/datasets/lj --config $VOCODER_BENCHMARK/config/wavenet_mulaw_normal.yaml

*For MelGAN and Parallel WaveGAN, they both use the same model cmd. You will need to choose the right configuration for each of them

# MelGAN
$ vocoder parallel_wavegan train --path ~/local/models/melgan --dataset ~/local/datasets/lj --config $VOCODER_BENCHMARK/config/melgan.v1.yaml

# Parallel WaveGAN
$ vocoder parallel_wavegan train --path ~/local/models/parallel_wavegan --dataset ~/local/datasets/lj --config $VOCODER_BENCHMARK/config/parallel_wavegan.yaml

Example of configuration files for each model is provided under config directory.

Synthesize

$ vocoder [model-cmd] synthesize --help
Usage: cli.py [model-cmd] synthesize [OPTIONS] INPUT_FILE OUTPUT_FILE

  Synthesize with the model.

Options:
  --path TEXT     Directory for the model  [required]
  --length TEXT   The length of the output sample in seconds
  --offset FLOAT  Offset in seconds of the sample
  --help          Show this message and exit.

Evaluate

$ vocoder [model-cmd] evaluate --help
Usage: cli.py [model-cmd] evaluate [OPTIONS]

  Evaluate a given vocoder.

Options:
  --path TEXT        Directory for the model  [required]
  --dataset TEXT     Name of the dataset to use  [required]
  --checkpoint TEXT  Checkpoint path (default: load latest checkpoint)
  --help             Show this message and exit.

*Frechet Audio Distance is currently not implemented. We use Google Research opensource repository to get FAD results.

Reference Repositories

Pytorch, Pytorch.
Audio, Pytorch.
FAD, Google Research.
WaveNet, Ryuichi Yamamoto.
Parallel WaveGAN, Tomoki Hayashi.
WaveGrad, Ivan Vovk.
DiffWave, LMNT.
Flops counter, Vladislav Sovrasov.

License

The majority of VocBench is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Wavenet, ParallelWaveGAN, and flops counter are licensed under the MIT license; diffwave is licensed under the Apache 2.0 license; WaveGrad is licensed under the BSD-3 license.

Used by

List of papers that used our work (Feel free to add your own paper by making a pull request)