CMT-pytorch

Source code and generated samples for CMT (Chord Conditioned Melody Transformer) model introduced in "Chord Conditioned Melody Generation with Transformer Based Decoders"

Requirements

matplotlib >= 3.3.1
numpy >= 1.19.1
pretty_midi >= 0.2.9
torch >= 1.0.0
pyyaml >= 0.2.5
scipy
tensorboardX

File descriptions

hparams.yaml : specifies hyperparameters and paths to load data or save results.
preprocess.py : makes instance pkl files from two track midi files
dataset.py : loads preprocessed pkl data
layers.py : self attention block and relative multi-head attention layers
model.py : implementation of CMT
loss.py : defines loss functions
trainer.py : utilities for loading, training, and saving models
run.py : main code to train CMT
generated samples.zip : zip file containing 15 generated samples which are used for subjective listening test

Preparing data

To train CMT, midi files containing melody and chords are necesary. Each midi file should have two instruments: the first instrument playing melody, and the second instrument playing all chordal notes of chords whenever chord changes.

Instance pkl files are made from two track midi files by executing the following command line:

$ python preprocess.py 
--root_dir [ROOT_DIR]
--midi_dir [MIDI_DIR]
--num_bars [NUMBER_OF_BARS]
--frame_per_bar [FRAME_PER_BAR]
--pitch_range [PITCH_RANGE]

Midi files should be located under $ROOT_DIR/MIDI_DIR
NUMBER_OF_BARS: number of bars to generate. Default is 8
FRAME_PER_BAR: number of unit notes in a bar. Default is 16 (16th note, time signature 4/4)
PITCH_RANGE: MIDI pitch range. Default is 48 (4 octaves)

To shift the pitch of melody and chords in 12 different keys, add argument --shift to the command line above.

Training CMT

$ python run.py 
--idx [EXPERIMENT_INDEX] 
--gpu_index [GPU_INDEX]
--ngpu [NUMBER_OF_GPU]
--optim_name [OPTIMIZER]
--restore_epoch [RESTORE_EPOCH]
--seed [RANDOM_SEED]

EXPERIMENT_INDEX: arbitrary index to distinguish different experiment settings
GPU_INDEX: index of GPU
NUMBER_OF_GPU: number of GPUs to use. If not specified, use only CPU.
OPTIMIZER: Optimizer to use. One of sgd, adam, rmsprop, default is adam
RESTORE_EPOCH: which checkpoint to restore when continuing an experiment

1st phase

Train the rhythm decoder (RD) with pitch varied rhythm data. In hparams.yaml, set the data_io path to directory containing pkl files with 12 different keys.

2nd phase

Retain RD from the 1st phase and train pitch decoder (PD) with single key data. In experiment config of hparams.yaml, specify the experiment index and epoch to load RD from (for example, idx 1, epoch 100). Execute run.py with additional --load_rhythm argument.