e6691-2022spring-project-WAVE-an3078-bmh2168-gs3160

Adversarial Audio Synthesis

This report summarizes the findings of the original Adversarial Audio Synthesis paper and shows a reproduction of the results with PyTorch [1]. SpecGAN and WaveGAN have been implemented.

We also experimented with the GANSynth model [2]. All the required helper functions for GANSynth to complete the audio processing are from [3].

[1] JDonahue, C., McAuley, J. and Puckette, M., 2018. Adversarial Audio Synthesis.

[2] Engel J, Agrawal KK, Chen S, Gulrajani I, Donahue C, Roberts A. Gansynth: Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710. 2019

[3 ] https://github.com/magenta/magenta/tree/main/magenta/models/gansynth/lib

Demo site

Examples of generated audio clips can be found on the demo page: https://ecbme6040.github.io/e6691-2022spring-project-WAVE-an3078-bmh2168-gs3160/

Libraries for SpecGAN & WaveGAN

librosa (pip install librosa) (sudo apt-get install libsndfile1)
torchaudio (conda install torchaudio -c pytorch)

Download the models

Model weights are ordered by dataset folders (link below)

Drive directory tree
Root/
├── WaveGAN/
│   ├── drum/
|   |   ├── examples_samples.pt
│   │   ├── generator.pt
│   │   └── discriminator.pt
│   ├── piano/
|   |   ├── examples_samples.pt
│   │   ├── generator.pt
│   │   └── discriminator.pt
│   └── sc09/
|   |   ├── examples_samples.pt
│   │   ├── generator.pt
│   │   └── discriminator.pt
├── SpecGAN/
│   ├── drum/
|   |   ├── examples_samples.pt
│   │   ├── generator.pt
│   │   └── discriminator.pt
│   ├── piano/
|   |   ├── examples_samples.pt
│   │   ├── generator.pt
│   │   └── discriminator.pt
│   ├── sc09/
|   |   ├── examples_samples.pt
│   │   ├── generator.pt
│   │   └── discriminator.pt
├── GANSynth/
│   ├── NSynth/
|   |   ├── checkpoint
|   |   ├── model.ckpt-11000000.meta
│   │   ├── model.ckpt-11000000.index
│   │   └── model.ckpt-11000000.data-00000-of-00001

Lion drive link (https://drive.google.com/drive/folders/1CPD3boEK5Dw2LmLcUIzUJOnStPdkuBL5?usp=sharing)

Download the data sets

Drums dataset http://deepyeti.ucsd.edu/cdonahue/wavegan/data/drums.tar.gz
Piano Bach dataset http://deepyeti.ucsd.edu/cdonahue/mancini_piano.tar.gz
Speech sc09 dataset http://deepyeti.ucsd.edu/cdonahue/sc09.tar.gz
NSynth dataset https://magenta.tensorflow.org/datasets/nsynth

Organization of this directory

'./WaveGan and SpecGAN' folders contain relevant code for the GANSynth model

'./GANSynth' folder contains relevant code for the GANSynth model

'./docs' folder contains website code, and audio examples

│   .gitignore
│   E6691.2022Spring.WAVE.report.an3078.bmh2168.gs3160.pdf
│   README.md
│
├───docs
│   │   README.md
│   │
│   └───examples
│       │   README.md
│       │
│       ├───GANSynth
│       │       generated_1.mp3
│       │       generated_2.mp3
│       │       generated_3.mp3
│       │       real_1.mp3
│       │       real_2.mp3
│       │       real_3.mp3
│       │
│       ├───paper
│       │       specgan_drums.mp3
│       │       specgan_piano.mp3
│       │       specgan_sc09.mp3
│       │       wavegan_drums.mp3
│       │       wavegan_piano.mp3
│       │       wavegan_sc09.mp3
│       │
│       ├───specgan
│       │       drum denoised.mp3
│       │       drum.mp3
│       │       piano.mp3
│       │       sc09.mp3
│       │
│       └───wavegan
│               drum n=0.mp3
│               drum n=2.mp3
│               piano.mp3
│               sc09.mp3
│
├───GANSynth
│   │   gansynth_generate.py
│   │   gansynth_train.py
│   │   README.md
│   │   __init__.py
│   │
│   ├───configs
│   │       mel_prog_hires.py
│   │       __init__.py
│   │
│   └───lib
│           datasets.py
│           data_helpers.py
│           data_normalizer.py
│           flags.py
│           generate_util.py
│           layers.py
│           model.py
│           networks.py
│           network_functions.py
│           specgrams_helper.py
│           specgrams_helper_test.py
│           spectral_ops.py
│           spectral_ops_test.py
│           train_util.py
│           util.py
│           __init__.py
│
└───WaveGan and SpecGAN
    │   Generate audio.ipynb
    │   Inception score.ipynb
    │   Inception training.ipynb
    │   README.md
    │   SpecGan Training.ipynb
    │   Wavegan Training.ipynb
    │
    └───utils
        │   generate_show_audio.py
        │   README.md
        │   specgan.py
        │   split_data.py
        │   utils.py
        │   wavegan.py

ecbme6040/e6691-2022spring-project-WAVE-an3078-bmh2168-gs3160