jinglescode/papers

Music transformer: Generating music with long-term structure

jinglescode opened this issue 5 years ago · 0 comments

jinglescode commented 5 years ago

Paper

Link: https://openreview.net/forum?id=rJe4ShAcF7
Year: 2018

Summary

relative attention is very well-suited for generative modeling of symbolic music
relative attention to much longer sequences such as long texts or even audio waveforms

Contributions and Distinctions from Previous Works

Transformers unable to perform long sequence like music

Methods

take a language-modeling approach to training generative models for symbolic music. Hence we represent music as a sequence of discrete tokens, with the vocabulary determined by the dataset. Datasets in different genres call for different ways of serializing polyphonic music into a single stream and also discretizing time
perform "skewing" for a memory efficient implementation of relative position based attention

Results

relative self-attention mechanism, dramatically reducing its memory
requirements from O(L^2D) to O(LD). For example, the memory consumption per layer is reduced from 8.5 GB to 4.2 MB (per head from 1.1 GB to 0.52 MB) for a sequence of length L = 2048 and hidden-state size D = 512
perceived as more coherent than the baseline Transformer model
generalize and generate in consistent fashion beyond the length it was trained on