audio2mel_preprocessor

A tool aims to transform audio to mel-spectrogram for speech dataset.

You can use it to prepare the data for training acoustic model(eg. Tacotron2)/vocoder(eg. MelGAN, Hifi-GAN).

Setup

python preprocess.py --dataset=DataBaker --indir=path/BZNSYP --outdir=./training_data

To support more dataset type, you can define additional processing script in "./datasets/", just refer to "ljspeech.py" & "databaker.py".(Welcome to commit !)
The data will be processed as:

outdir/

|--train.txt (the format can be modified in preprocess.py(def write_metadata))

|--audio/

|--mels/

|--linear/

"MultiSets" is used for multi-speaker or multilingual dataset.
"config.json" is used to extract mel-spectrogram under different acoustic parameters, we provide 16k and 22k as reference("./datasets/config16k.json").
Linear spectrograms require very large memory, if not need, you could delete the "linears/" in the outdir.
Now support dataset params: