This repository contains the evaluation tool used in "BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network" (arXiv 2309.02836). Please cite [1] in your work when using this code in your experiments.
First, prepare an environment
pip install -r requirements.txt
Then, perform an evaluation
python evaluate.py <gt_dir 1> <synth_dir 1> <gt_dir 2> <synth_dir 2> ... <gt_dir N> <synth_dir N>
gt_dir n
means a directory that contains ground-truth audio files, and synth_dir n
means a directory that contains synthesized audio files. Each file in synth_dir n
needs to have the corresponding file that has the same name in gt_dir n
. Also, a corresponding pair needs to be time-aligned in advance.
evaluate.py
will output calculated metrics for each gt_dir n
-synth_dir n
pair and the macro averages of them across all pairs. It will take some time to complete an evaluation.
This toolbox supports the following metrics:
- M-STFT: Multi-resolution short-term Fourier transform
- PESQ: Perceptual evaluation of speech quality
- MCD: Mel-cepstral distortion
- Periodicity: Periodicity error
- V/UV F1: F1 score of voiced/unvoiced classification
If you find this tool useful, please consider citing
[1] Shibuya, T., Takida, Y., Mitsufuji, Y., "BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network," ICASSP 2024.
@inproceedings{shibuya2024bigvsan,
title={{BigVSAN}: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network},
author={Shibuya, Takashi and Takida, Yuhta and Mitsufuji, Yuki},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2024}
}