Music Style Transfer With CycleGANs
Built with Tensorflow 2.3.1.
Special thanks to sumuzhao and the paper Symbolic Music Genre Transfer with CycleGAN for providing the resources and inspiration required to complete this project for NTU's CZ4042 Neural Networks and Deep Learning module.
Major Changes
-
sumuzhao's implementation did not run for 2 reasons:
- The usage of lambda layers, which was unsupported for the implementation of instance normalization layers and residual blocks.
- New classes
InstanceNormalization
andResNetBlock
extending fromkeras.layers.Layer
were created to replace them. - I raised a GitHub issue about this error on the original code repository. If the author approves, I will contribute a pull request with the above changes onto the repository. (Update: The author approved my pull request here! 😄)
- New classes
- Minor bugs in parsing command line arguments.
- Made a few alterations to the command line parsing logic.
- The usage of lambda layers, which was unsupported for the implementation of instance normalization layers and residual blocks.
-
SGD
andRMSprop
were added as optimizer choices. -
During CycleGAN training, discriminator, generator and cycle losses and accuracies over epochs are pickled for later examination.
-
During classifier training, test losses and accuracies over epochs are pickled for later examination.
-
During classifier testing, the test accuracies on the origin, cycle and transfer datasets are sorted and outputted to a CSV file for further examination.
Additional Scripts
-
/notebooks/visualization.ipynb
was created to visualize pickled files containing the losses and accuracies over epochs during CycleGAN and classifier training. -
/notebooks/tuning.ipynb
was created to tune the hyperparameters for the CycleGAN and classifier training. The tuned hyperparameters are as follows:- Standard deviation of Gaussian noise (
sigma_d
) - Number of filters in convolutional layers (
ndf
andngf
) - Optimizer choice (
optimizer
) - Optimizer momentum term (
beta1
) - Optimizer learning rate (
lr
)
- Standard deviation of Gaussian noise (
-
/scripts/classify.py
was created to test the classifier on a specified directory containing.npy
music arrays. -
/scripts/tomidi.py
was created to convert a.npy
music array to a.mid
file.
Usage
# Train CycleGAN model
python main.py --dataset_A_dir=JC_J --dataset_B_dir=JC_C --phase=train --type=cyclegan --sigma_d=0
# Generate origin, cycle and transfer outputs with the trained CycleGAN model
python main.py --dataset_A_dir=JC_J --dataset_B_dir=JC_C --phase=test --type=cyclegan --sigma_d=0
# Train classifier model
python main.py --dataset_A_dir=JC_J --dataset_B_dir=JC_C --phase=train --type=classifier --sigma_c=0
# Test classifier model on origin, cycle and transfer outputs
python main.py --dataset_A_dir=JC_J --dataset_B_dir=JC_C --phase=test --type=classifier --sigma_c=0
# Test classifier model on a specified directory containing .npy arrays
python scripts/classify.py --classify_dir=JC_J/test
# Convert a .npy array to a MIDI file
python scripts/tomidi.py --npy_filepath=JC_J/test/jazz_piano_test_1.npy
Datasets
The jazz, classical and pop datasets can be downloaded from the zip file here.