LakhNES (paper, music examples) is a deep neural network capable of generating music that can be played by the audio synthesis chip on the Nintendo Entertainment System (NES). It was trained on music composed for the NES by humans. Our model takes advantage of transfer learning: we pre-train on the heterogeneous Lakh MIDI dataset before fine tuning on the NES Music Database target domain.
This codebase primarily functions to allow for the generation of musical material using the pre-trained LakhNES model. LakhNES outputs sequences of musical events which need to be separately synthesized into 8-bit audio. The steps required are as follows:
- Set up your model environment
- Set up your audio synthesis environment
- Download a pre-trained checkpoint
- Generate and listen to chiptunes
This codebase also allows you to evaluate pre-trained models to reproduce the paper results. The steps required for this use case are as follows:
With this codebase you can also train a new model (though the documentation for this is still being improved):
The model environment requires Python 3 and Pytorch. The development version of Pytorch was 1.0.1.post2
, but hopefully the newest version will continue to work (see this section for a sanity check).
We recommend using virtualenv
as you will need a separate environment to perform audio synthesis.
cd LakhNES
virtualenv -p python3 --no-site-packages LakhNES-model
source LakhNES-model/bin/activate
pip install torch==1.0.1.post2 torchvision==0.2.2.post3
LakhNES requires the Python package nesmdb
to synthesize chiptune audio. Unfortunately, nesmdb
does not support Python 3 (which the rest of this codebase depends on).
We strongly recommend using virtualenv
to install nesmdb
and run it is a local RPC server. To do this, run the following commands from this repository:
cd LakhNES
virtualenv -p python2.7 --no-site-packages LakhNES-synth
source LakhNES-synth/bin/activate
pip install nesmdb
pip install pretty_midi
python data/synth_server.py 1337
This will expose an RPC server on port 1337
with two methods: tx1_to_wav
and tx2_to_wav
. Both take a TX1/TX2
input file path, a WAV
output file path, and optionally a MIDI
downsampling rate. A lower rate speeds up synthesis but will mess up the rhythms (if not specified, no downsampling will occur).
If you wish to test your synthesis environment on human-composed music, you first need to download the data. Then, if you have both your model and synthesis environments ready, you can synthesize a chiptune from Kirby's Adventure:
source LakhNES-model/bin/activate
python data/synth_client.py data/nesmdb_tx1/train/191_Kirby_sAdventure_02_03PlainsLevel.tx1.txt plains_tx1.wav 48
aplay plains_tx1.wav
python data/synth_client.py data/nesmdb_tx2/train/191_Kirby_sAdventure_02_03PlainsLevel.tx2.txt plains_tx2.wav 48
aplay plains_tx2.wav
Here we provide all of the Transformer-XL checkpoints used for the results in our paper. We recommend using the LakhNES
checkpoint which was pretrained on Lakh MIDI for 400k batches before fine tuning on NES-MDB. However, the others can also produce interesting results (in particular NESAug
).
- (147 MB) (Recommended) Download
LakhNES
(400k steps Lakh pre-training) - (147 MB) Download
Lakh200k
(200k steps Lakh pre-training) - (147 MB) Download
Lakh100k
(100k steps Lakh pre-training) - (147 MB) Download
NESAug
(No Lakh pre-training but uses data augmentation) - (147 MB) Download
NES
(No Lakh pre-training or data augmentation) - (147 MB) Download
Lakh400kPretrainOnly
(LakhNES
model without NES-MDB finetuning)
To generate new chiptunes, first set up your model environment, download a checkpoint, and start your synthesis server. Then, run the following:
source LakhNES-model/bin/activate
python generate.py \
<MODEL_DIR> \
--out_dir ./generated \
--num 1
python data/synth_client.py ./generated/0.tx1.txt ./generated/0.tx1.wav
aplay ./generated/0.tx1.wav
We've also included the IPython notebooks we used to create the continuations of human-composed chiptunes (continuations.ipynb
) and rhythm accompaniment examples (accompany_rhythm.ipynb
) as heard on our examples page.
To adapt music data to the Transformer architecture, we process MIDI files (top) into an event-based representation akin to language (bottom). Each event is musically meaningful such as a note starting or time advancing.
LakhNES is first trained on Lakh MIDI and then fine tuned on NES-MDB. The MIDI files from these datasets are first converted into a list of musical events to adapt them to the Transformer architecture.
The NES-MDB dataset has been preprocessed into two event-based formats: TX1
and TX2
. The TX1
format only has composition information: the notes and their timings. The TX2
format has expressive information: dynamics and timbre information.
You can get the data in TX1
(used in our paper) and TX2
(not used in our paper) formats here:
- (10 MB) Download NES-MDB in TX1 Format
- (20 MB) Download NES-MDB in TX2 Format
Other instructions in this README assume that you have moved (at least one of) these bundles to the LakhNES/data
folder and tar xvfz
them there.
If you download all of the above checkpoints and tar xvfz
them under LakhNES/model/pretrained
, you can reproduce the exact numbers from our paper (Table 2 and Figure 3):
source LakhNES-model/bin/activate
cd model
./reproduce_paper_eval.sh
This should take a few minutes and yield valid PPLs of [4.099, 3.175, 2.911, 2.817, 2.800]
and test PPLs of [3.501, 2.741, 2.545, 2.472, 2.460]
in order.
I (Chris) admit it. My patch of the official Transformer-XL codebase (which lives under the model
subdirectory) is among the ugliest code I've ever written. Instructions about how to use it are forthcoming, though the adventurous among you are welcome to try before then. For now, I focused on making the pretrained checkpoints easy to use. I hope that will suffice for now.
One asset of our training pipeline, the code which adapts Lakh MIDI to NES MIDI for transfer learning, is somewhat more polished. It can be found at LakhNES/data/adapt_lakh_to_nes.py
.
Information about how to use the code for our Amazon Mechanical Turk user study (under LakhNES/userstudy
) is forthcoming.
If you use this work in your research, please cite us via the following BibTeX:
@inproceedings{donahue2019lakhnes,
title={LakhNES: Improving multi-instrumental music generation with cross-domain pre-training},
author={Donahue, Chris and Mao, Huanru Henry and Li, Yiting Ethan and Cottrell, Garrison W. and McAuley, Julian},
booktitle={ISMIR},
year={2019}
}