/mmt

Official Implementation of "Multitrack Music Transformer" (ICASSP 2023)

Primary LanguagePythonMIT LicenseMIT

Multitrack Music Transformer

This repository contains the official implementation of "Multitrack Music Transformer" (ICASSP 2023).

Multitrack Music Transformer
Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley and Taylor Berg-Kirkpatrick
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
[homepage] [paper] [code] [reviews]

Content

Prerequisites

We recommend using Conda. You can create the environment with the following command.

conda env create -f environment.yml

Preprocessing

Preprocessed Datasets

The preprocessed datasets can be found here.

Extract the files to data/{DATASET_KEY}/processed/json and data/{DATASET_KEY}/processed/notes, where DATASET_KEY is sod, lmd, lmd_full or snd.

Preprocessing Scripts

You can skip this section if you download the preprocessed datasets.

Step 1 -- Download the datasets

Please download the Symbolic orchestral database (SOD). You may download it via command line as follows.

wget https://qsdfo.github.io/LOP/database/SOD.zip

We also support the following two datasets:

Step 2 -- Prepare the name list

Get a list of filenames for each dataset.

find data/sod/SOD -type f -name *.mid -o -name *.xml | cut -c 14- > data/sod/original-names.txt

Note: Change the number in the cut command for different datasets.

Step 3 -- Convert the data

Convert the MIDI and MusicXML files into MusPy files for processing.

python convert_sod.py

Note: You may enable multiprocessing with the -j option, for example, python convert_sod.py -j 10 for 10 parallel jobs.

Step 4 -- Extract the note list

Extract a list of notes from the MusPy JSON files.

python extract.py -d sod

Step 5 -- Split training/validation/test sets

Split the processed data into training, validation and test sets.

python split.py -d sod

Training

Pretrained Models

The pretrained models can be found here.

Training Scripts

Train a Multitrack Music Transformer model.

  • Absolute positional embedding (APE):

    python mmt/train.py -d sod -o exp/sod/ape -g 0

  • Relative positional embedding (RPE):

    python mmt/train.py -d sod -o exp/sod/rpe --no-abs_pos_emb --rel_pos_emb -g 0

  • No positional embedding (NPE):

    python mmt/train.py -d sod -o exp/sod/npe --no-abs_pos_emb --no-rel_pos_emb -g 0

Generation (Inference)

Generate new samples using a trained model.

python mmt/generate.py -d sod -o exp/sod/ape -g 0

Evaluation

Evaluate the trained model using objective evaluation metrics.

python mmt/evaluate.py -d sod -o exp/sod/ape -ns 100 -g 0

Acknowledgment

The code is based largely on the x-transformers library developed by lucidrains.

Citation

Please cite the following paper if you use the code provided in this repository.

Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley, and Taylor Berg-Kirkpatrick, "Multitrack Music Transformer," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

@inproceedings{dong2023mmt,
    author = {Hao-Wen Dong and Ke Chen and Shlomo Dubnov and Julian McAuley and Taylor Berg-Kirkpatrick},
    title = {Multitrack Music Transformer},
    booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    year = 2023,
}