This repository contains the code and samples for our paper "Unconditional Audio Generation with GAN and Cycle Regularization", accepted by INTERSPEECH 2020. The goal is to unconditionally generate singing voices, speech, and instrument sounds with GAN.
The model is implemented with PyTorch.
Unconditional Audio Generation with GAN and Cycle Regularization
pip install -r requirements.txt
The pretrained parameters can be downloaded here: Pretrained parameters
Unzip it so that the models
folder is in the current folder.
Or use the following script
bash download_and_unzip_models.sh
Display the options
python generate.py -h
The following commands are equivalent.
python generate.py
python generate.py -data_type singing -arch_type hc --duration 10 --num_samples 5
python generate.py -d singing -a hc --duration 10 -ns 5
python generate.py -d speech
python generate.py -d piano
python generate.py -d violin
We use MelGAN as the vocoder. The trained vocoders are included in the models.zip
For singing, piano, and violin, we have modify the MelGAN to include GRU in the vocoder architecture. We have found that this modification yields improved audio quality. For speech, we directly use the trained LJ vocoder from MelGAN.
One may use the following steps to train their own models.
-
(Singing only) Separate singing voices from the audios you collect. We use a separation model we developed. You can use open-sourced ones such as Open-Unmix or Spleeter.
-
scripts/collect_audio_clips.py
-
scripts/extract_mel.py
-
scripts/make_dataset.py
-
scripts/compute_mean_std.mel.py
-
scripts/train.*.py
You can replace the path in the param_fp
variable in generate.py
with either params.Generator.best_Convergence.torch
or params.Generator.latest.torch
in the folder of the trained model. Files with extensions .torch and .pt are both saved parameters.
Some generated audio samples can be found in:
samples/