This is an unofficial implementation (WIP!) of the paper "Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks".
Instead of training on the MSMD dataset as in the original paper, we train on ASAP with synthetic structural augmentations.
- Calculate cross-similarity
- Add structural augmentations
- Prepare dataset class
- Add model implementations
- Write training pipeline
- Write inference pipeline