/remi

"Pop Music Transformer: Generating Music with Rhythm and Harmony", arXiv 2020

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

REMI

Authors: Yu-Siang Huang, Yi-Hsuan Yang

Paper (arXiv) | Blog | Audio demo (Google Drive) | Online interactive demo

REMI, which stands for REvamped MIDI-derived events, is a new event representation we propose for converting MIDI scores into text-like discrete tokens. Compared to the MIDI-like event representation adopted in exising Transformer-based music composition models, REMI provides sequence models a metrical context for modeling the rhythmic patterns of music. Using REMI as the event representation, we train a Transformer-XL model to generate minute-long Pop piano music with expressive, coherent and clear structure of rhythm and harmony, without needing any post-processing to refine the result. The model also provides controllability of local tempo changes and chord progression.

Citation

@article{huang2020pop,
  title={Pop music transformer: Generating music with rhythm and harmony},
  author={Huang, Yu-Siang and Yang, Yi-Hsuan},
  journal={arXiv preprint arXiv:2002.00212},
  year={2020}
}

Getting Started

Install Dependencies

  • python 3.6 (recommend using Anaconda)
  • tensorflow-gpu 1.14.0 (pip install tensorflow-gpu==1.14.0)
  • miditoolkit (pip install miditoolkit)

Download Pre-trained Checkpoints

We provide two pre-trained checkpoints for generating samples.

Obtain the MIDI Data

We provide the MIDI files including local tempo changes and estimated chord. (5 MB)

  • data/train: 775 files used for training models
  • data/evaluation: 100 files (prompts) used for the continuation experiments

Generate Samples

See main.py as an example:

from model import PopMusicTransformer
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

def main():
    # declare model
    model = PopMusicTransformer(
        checkpoint='REMI-tempo-checkpoint',
        is_training=False)
    # generate from scratch
    model.generate(
        n_target_bar=16,
        temperature=1.2,
        topk=5,
        output_path='./result/from_scratch.midi',
        prompt=None)
    # generate continuation
    model.generate(
        n_target_bar=16,
        temperature=1.2,
        topk=5,
        output_path='./result/continuation.midi',
        prompt='./data/evaluation/000.midi')
    # close model
    model.close()

if __name__ == '__main__':
    main()

Convert MIDI to REMI

You can find out how to convert the MIDI messages into REMI events in the midi2remi.ipynb.

FAQ

1. How to synthesize the audio files (e.g., mp3)?

We strongly recommend using DAW (e.g., Logic Pro) to open/play the generated MIDI files. Or, you can use FluidSynth with a SoundFont. However, it may not be able to correctly handle the tempo changes (see fluidsynth/issues/141).

2. What is the function of the inputs "temperature" and "topk"?

It is the temperature-controlled stochastic sampling methods are used for generating text from a trained language model. You can find out more details in the reference paper CTRL: 4.1 Sampling.

It is worth noting that the sampling method used for generation is very critical to the quality of the output, which is a research topic worthy of further exploration.

3. How to finetune with my personal MIDI data?

Please see issue/Training on custom MIDI corpus

Acknowledgement