SEQ2SEQ Piano Transcription

  • Tokenization is based on the Magenta Team's work
  • Can we perform piano transcription using Listen Attend and Spell model?

Data

  • MAESTRO dataset V1.0
  • 512 Mel bin with 10ms Resolution
  • MIDI tokenziation with absolute time shift for each segments
  • Vocab contains velocity, Note and time stamp

Model Structure

Encoder

  • 4 Layers of BI-LSTM

Decoder

  • Additive attention with 1 Layers of Uni-LSTM with linear-layer

Result

Training curve

Training Curve

F-1 Socre result

Result

TO-DO

  • Requries code Refactoring
  • This repo is not final version