jinglescode/papers

Music transformer: Generating music with long-term structure

jinglescode opened this issue · 0 comments

Paper

Link: https://openreview.net/forum?id=rJe4ShAcF7
Year: 2018

Summary

  • relative attention is very well-suited for generative modeling of symbolic music
  • relative attention to much longer sequences such as long texts or even audio waveforms

Contributions and Distinctions from Previous Works

  • Transformers unable to perform long sequence like music

Methods

  • take a language-modeling approach to training generative models for symbolic music. Hence we represent music as a sequence of discrete tokens, with the vocabulary determined by the dataset. Datasets in different genres call for different ways of serializing polyphonic music into a single stream and also discretizing time
  • perform "skewing" for a memory efficient implementation of relative position based attention

Results

  • relative self-attention mechanism, dramatically reducing its memory
    requirements from O(L^2D) to O(LD). For example, the memory consumption per layer is reduced from 8.5 GB to 4.2 MB (per head from 1.1 GB to 0.52 MB) for a sequence of length L = 2048 and hidden-state size D = 512
  • perceived as more coherent than the baseline Transformer model
  • generalize and generate in consistent fashion beyond the length it was trained on