/LakhNES

Generate 8-bit chiptunes with deep learning

Primary LanguagePython

LakhNES: Generate 8-bit music with machine learning

LakhNES (paper, music examples) is a deep neural network capable of generating music that can be played by the audio synthesis chip on the Nintendo Entertainment System (NES). It was trained on music composed for the NES by humans. Our model takes advantage of transfer learning: we pre-train on the heterogeneous Lakh MIDI dataset before fine tuning on the NES Music Database target domain.

Using this codebase

Generating new chiptunes

This codebase primarily functions to allow for the generation of musical material using the pre-trained LakhNES model. LakhNES outputs sequences of musical events which need to be separately synthesized into 8-bit audio. The steps required are as follows:

  1. Set up your model environment
  2. Set up your audio synthesis environment
  3. Download a pre-trained checkpoint
  4. Generate and listen to chiptunes

Evaluating pre-trained checkpoints

This codebase also allows you to evaluate pre-trained models to reproduce the paper results. The steps required for this use case are as follows:

  1. Set up your model environment
  2. Download the pre-trained checkpoints
  3. Run the eval script

Training new checkpoints

With this codebase you can also train a new model (though the documentation for this is still being improved):

  1. Set up your model environment
  2. Download the data
  3. Train a new model

Model environment

The model environment requires Python 3 and Pytorch. The development version of Pytorch was 1.0.1.post2, but hopefully the newest version will continue to work (see this section for a sanity check).

We recommend using virtualenv as you will need a separate environment to perform audio synthesis.

cd LakhNES
virtualenv -p python3 --no-site-packages LakhNES-model
source LakhNES-model/bin/activate
pip install torch==1.0.1.post2 torchvision==0.2.2.post3

Synthesis environment

LakhNES requires the Python package nesmdb to synthesize chiptune audio. Unfortunately, nesmdb does not support Python 3 (which the rest of this codebase depends on).

We strongly recommend using virtualenv to install nesmdb and run it is a local RPC server. To do this, run the following commands from this repository:

cd LakhNES
virtualenv -p python2.7 --no-site-packages LakhNES-synth
source LakhNES-synth/bin/activate
pip install nesmdb
pip install pretty_midi
python data/synth_server.py 1337

This will expose an RPC server on port 1337 with two methods: tx1_to_wav and tx2_to_wav. Both take a TX1/TX2 input file path, a WAV output file path, and optionally a MIDI downsampling rate. A lower rate speeds up synthesis but will mess up the rhythms (if not specified, no downsampling will occur).

(Optional) Test your synthesis environment on human-composed music

If you wish to test your synthesis environment on human-composed music, you first need to download the data. Then, if you have both your model and synthesis environments ready, you can synthesize a chiptune from Kirby's Adventure:

source LakhNES-model/bin/activate
python data/synth_client.py data/nesmdb_tx1/train/191_Kirby_sAdventure_02_03PlainsLevel.tx1.txt plains_tx1.wav 48
aplay plains_tx1.wav
python data/synth_client.py data/nesmdb_tx2/train/191_Kirby_sAdventure_02_03PlainsLevel.tx2.txt plains_tx2.wav 48
aplay plains_tx2.wav

Download checkpoints

Here we provide all of the Transformer-XL checkpoints used for the results in our paper. We recommend using the LakhNES checkpoint which was pretrained on Lakh MIDI for 400k batches before fine tuning on NES-MDB. However, the others can also produce interesting results (in particular NESAug).

  • (147 MB) (Recommended) Download LakhNES (400k steps Lakh pre-training)
  • (147 MB) Download Lakh200k (200k steps Lakh pre-training)
  • (147 MB) Download Lakh100k (100k steps Lakh pre-training)
  • (147 MB) Download NESAug (No Lakh pre-training but uses data augmentation)
  • (147 MB) Download NES (No Lakh pre-training or data augmentation)
  • (147 MB) Download Lakh400kPretrainOnly (LakhNES model without NES-MDB finetuning)

Generate new chiptunes

To generate new chiptunes, first set up your model environment, download a checkpoint, and start your synthesis server. Then, run the following:

source LakhNES-model/bin/activate
python generate.py \
	<MODEL_DIR> \
	--out_dir ./generated \
	--num 1
python data/synth_client.py ./generated/0.tx1.txt ./generated/0.tx1.wav
aplay ./generated/0.tx1.wav

We've also included the IPython notebooks we used to create the continuations of human-composed chiptunes (continuations.ipynb) and rhythm accompaniment examples (accompany_rhythm.ipynb) as heard on our examples page.

Download data


To adapt music data to the Transformer architecture, we process MIDI files (top) into an event-based representation akin to language (bottom). Each event is musically meaningful such as a note starting or time advancing.

LakhNES is first trained on Lakh MIDI and then fine tuned on NES-MDB. The MIDI files from these datasets are first converted into a list of musical events to adapt them to the Transformer architecture.

The NES-MDB dataset has been preprocessed into two event-based formats: TX1 and TX2. The TX1 format only has composition information: the notes and their timings. The TX2 format has expressive information: dynamics and timbre information.

You can get the data in TX1 (used in our paper) and TX2 (not used in our paper) formats here:

Other instructions in this README assume that you have moved (at least one of) these bundles to the LakhNES/data folder and tar xvfz them there.

Reproduce paper results

If you download all of the above checkpoints and tar xvfz them under LakhNES/model/pretrained, you can reproduce the exact numbers from our paper (Table 2 and Figure 3):

source LakhNES-model/bin/activate
cd model
./reproduce_paper_eval.sh

This should take a few minutes and yield valid PPLs of [4.099, 3.175, 2.911, 2.817, 2.800] and test PPLs of [3.501, 2.741, 2.545, 2.472, 2.460] in order.

Train LakhNES

I (Chris) admit it. My patch of the official Transformer-XL codebase (which lives under the model subdirectory) is among the ugliest code I've ever written. Instructions about how to use it are forthcoming, though the adventurous among you are welcome to try before then. For now, I focused on making the pretrained checkpoints easy to use. I hope that will suffice for now.

One asset of our training pipeline, the code which adapts Lakh MIDI to NES MIDI for transfer learning, is somewhat more polished. It can be found at LakhNES/data/adapt_lakh_to_nes.py.

User study

Information about how to use the code for our Amazon Mechanical Turk user study (under LakhNES/userstudy) is forthcoming.

Attribution

If you use this work in your research, please cite us via the following BibTeX:

@inproceedings{donahue2019lakhnes,
  title={LakhNES: Improving multi-instrumental music generation with cross-domain pre-training},
  author={Donahue, Chris and Mao, Huanru Henry and Li, Yiting Ethan and Cottrell, Garrison W. and McAuley, Julian},
  booktitle={ISMIR},
  year={2019}
}