/Beat-Transformer

Codes for ISMIR 2022 paper: Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention

Primary LanguagePythonMIT LicenseMIT

Beat Transformer

Open In Colab

Repository for paper: Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention in Proceedings of the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), Bengaluru, India.

Welcome to test our model on your own music at our Google Colab.

Code and File Directory

This repository is organized as follows:

root
│
└───checkpoint                          PyTorch model checkpoints
    │   ···
│   
└───code
    └───ablation_models                 ablation models
        │   ···                            
    │   DilatedTransformer.py           Beat Transformer model
    │   DilatedTransformerLayer.py      Dilated Self-Attention
    │   spectrogram_dataset.py          data loader
    │   train.py                        training script
    │   ...                             code for other utilities
│   
└───data
    └───audio_lists                     Order info of pieces in each dataset
        │   ···                     
    │   demix_spectrogram_data.npz      demixed spectrogram data (33GB, to be downloaded)
    │   full_beat_annotation.npz        beat/downbeat annotation
│   
└───preprocessing                       code for data pre-processing
    │   ···
│   
└───save                                training log and more
    │   ···

How to run

  • To quickly reproduce the accuracy reported in our paper, simply run ./code/eight_fold_test.py.
  • To quickly test our model with your own music, welcome to our Google Colab.
  • If you wish to train our model from scratch, first download our processed dataset (33GB in total, including demixed spectrogram data of Ballroom, Hainsworth, Carnetic, Harmonix, SMC, and GTZAN).
  • Executing ./code/train.sh will train our model in 8-fold cross validation. If you wish to train one single fold, you can run ./code/train.py after specifying DEBUG_MODE, FOLD, and GPU. When DEBUG_MODE=1, it will load a small portion of data to quickly run through with a smaller bach size.
  • We also release out ablation model architectures in ./code/ablation_models. We release our data processing scripts in ./preprocessing/demixing.py, where we call Spleeter to demix each piece and save the demixed spectrogram.

Audio Data

We use a total of 7 datasets for model training and testing. If you wish to acquire the audio data, you can follow the following guidelines:

  • Ballroom Dataset (audio) is available here. There are 13 duplicated pieces and I discarded them in my experiments. For more information, see here.

  • Hainsworth Dataset (audio) is no longer accessible via the original link. Since Hainsworth is a well-known public dataset, I guess it's okay to share my copy. You can download Hainsworth here.

  • GTZAN Dataset (audio) is available on Kaggle. You need a registered Kaggle account to download it.

  • SMC Dataset (audio) is available here.

  • Carnetic Dataset (audio) is on Zenodo. You can download it by request.

  • Harmonix Dataset (mel-spectrogram) is available here. I used the Griffin-Lim algorithm in Librosa to convert mel-spectrogram to audio, which (however) is lossful. My conversion code is here.

  • RWC POP (audio) seems NOT royalty-free so I'm afraid I cannot share the audio. For more info about this dataset, you can go to its official webpage.

For the beat/downbeat annotation of Ballroom, GTZAN, SMC, and Hainsworth, I used the annotation released by Sebastian Böck here.

Contact

Jingwei Zhao (PhD student in Data Science at NUS)

jzhao@u.nus.edu

Nov. 24, 2022