/musegan

An AI for Music Generation

Primary LanguagePythonMIT LicenseMIT

MuseGAN

MuseGAN is a project on music generation. In essence, we aim to generate polyphonic music of multiple tracks (instruments) with harmonic and rhythmic structure, multi-track interdependency and temporal structure. To our knowledge, our work represents the first approach that deal with these issues altogether.

The models are trained with Lakh Pianoroll Dataset (LPD), a new multi-track piano-roll dataset, in an unsupervised approach. The proposed models are able to generate music either from scratch, or by accompanying a track given by user. Specifically, we use the model to generate pop song phrases consisting of bass, drums, guitar, piano and strings tracks.

Sample results are available here.

BinaryMuseGAN

BinaryMuseGAN is a follow-up project of the MuseGAN project.

In this project, we first investigate how the real-valued piano-rolls generated by the generator may lead to difficulties in training the discriminator for CNN-based models. To overcome the binarization issue, we propose to append to the generator an additional refiner network, which try to refine the real-valued predictions generated by the pretrained generator to binary-valued ones. The proposed model is able to directly generate binary-valued piano-rolls at test time.

We trained the network with Lakh Pianoroll Dataset (LPD). We use the model to generate four-bar musical phrases consisting of eight tracks: Drums, Piano, Guitar, Bass, Ensemble, Reed, Synth Lead and Synth Pad. Audio samples are available here.

Run the code

Prepare Training Data

  • Prepare your own data or download our training data

    The array will be reshaped to (-1, num_bar, num_timestep, num_pitch, num_track). These variables are defined in config.py.

    • lastfm_alternative_5b_phrase.npy (2.1 GB) contains 12,444 four-bar phrases from 2,074 songs with alternative tags. The shape is (2074, 6, 4, 96, 84, 5). The five tracks are Drums, Piano, Guitar, Bass and Strings.
    • lastfm_alternative_8b_phrase.npy (3.6 GB) contains 13,746 four-bar phrases from 2,291 songs with alternative tags. The shape is (2291, 6, 4, 96, 84, 8). The eight tracks are Drums, Piano, Guitar, Bass, Ensemble, Reed, Synth Lead and Synth Pad.
    • Download the data with this script.
  • (optional) Save the training data to shared memory with this script.

  • Specify training data path and location in config.py. (see below)

Configuration

Modify config.py for configuration.

  • Quick setup

    Change the values in the dictionary SETUP for a quick setup. Documentation is provided right after each key.

  • More configuration options

    Four dictionaries EXP_CONFIG, DATA_CONFIG, MODEL_CONFIG and TRAIN_CONFIG define experiment-, data-, model- and training-related configuration variables, respectively.

    The automatically-determined experiment name is based only on the values defined in the dictionary SETUP, so remember to provide the experiment name manually (so that you won't overwrite a trained model).

Run

python main.py

Papers

  • Hao-Wen Dong and Yi-Hsuan Yang, "Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation," to appear at International Society for Music Information Retrieval Conference (ISMIR), 2018. [website] [arxiv] [slides]

  • Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang and Yi-Hsuan Yang, "MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment," in AAAI Conference on Artificial Intelligence (AAAI), 2018. [website] [arxiv] [slides]

  • Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang and Yi-Hsuan Yang, "MuseGAN: Demonstration of a Convolutional GAN Based Model for Generating Multi-track Piano-rolls," in ISMIR Late-Breaking and Demo Session, 2017. (non-peer reviewed two-page extended abstract) [paper] [poster]

* These authors contributed equally to this work.