/music_generation

a project building music generation models

Primary LanguageJupyter NotebookMIT LicenseMIT

music_generation

Project Aim...

this was an independent out-of-interest project aiming to create an AI model which can generate music.

Methodology...

here, "music" is taken to be a sequence of notes each with an associated pitch (C, C#, D,...) and duration (16th, quarter, half, ...).

the raw data used in this project was a dataset of 3029 classical songs in the form of midi files.

the music21 library was used to parse the midi files and extract the sequences of notes described by pitch and duration. for pitch, only the pitchClass was considered with no octave information yet. chords were converted to notes by taking their root notes. durations were reduced to classes of '16th', 'eighth', 'quarter', 'half', and 'whole' since other durations were uncommon. this is done in data_processing.ipynb with the help of functions from utils.py

some basic EDA was performed to find the distributions of pitches and durations in the dataset.

the model's training data consisted of inputs of 32-note sequences and outputs of the next note (to be predicted by the model). this is done in modeling.ipynb with the help of functions from utils.py. 7,928,979 sequences of 32 notes each were generated.

two similar models were built. they take in 2 inputs: sequences of pitches and durations, and yield two outputs: predictions of the next pitch and duration. an embedding is trained on each of the pitches and durations. this is then fed into an LSTM layer. finally dense layers make a softmax prediction of the next pitch and duration. since the generated training dataset was so large, for now the models are only trained on a small portion of the data due to computational requirements. this is also done in modeling.ipynb.

music is generated by feeding the model a seed and then recursively adding notes to the sequence. once a sequence is generated it is converted back to a midi file.

Results...

the models achieved a pitch accuracy of ~43% and duration accuracy of 76%. this is significantly better than how a stratified dummy would do (see EDA). a sample of generated music is saved in the form of midi files in the generated_music directory so feel free to have a listen! subjectively, they definitely sound much better than a random sequence of notes, and there are some interesting patterns that arise, but you can still kind of tell it wasn't written by a human musician.

Things to try...

this was a relatively quick project so there are lots of things left to try even with the simplified structure it followed.

  • add chords into the dataset
    • one of the main drawbacks of the music generated by the models is that they sound quite simple with just one note being played at any given time. support for chords, even just simple major/minor chords to start with, would probably make the music sound much richer
  • add support for more complex pitch encoding
    • add octave information. perhaps encode pitch as a mapping to the 88 keys of a piano (or the full set of pitches from some other instrument). another option is to use the raw midi classes.
  • train on full dataset:
    • computational limitations, even with a gpu courtesy of google colab, made this difficult.
  • experiment with more model architectures/add some regularization
  • EDA on frequency of certain n-grams of notes
  • train models on different genres of music