This repository contains all experimental code for reproducing the results reported in our article:
Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification (PDF).
Dorfer M., Hajič J. jr., Arzt A., Frostel H., and Widmer G.
Transactions of the International Society for Music Information Retrieval, 2018
The paper above is an invited extension of the work presented in:
Learning audio-sheet music correspondences for score identification and offline alignment.
Dorfer M., Arzt A., and Widmer G.
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2017.
The retrieval methodology employed in both works is based on the CCA Projection Layer described in:
End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss.
Dorfer M., Schlüter J., Vall A., Korzeniowski F., and Widmer G.
International Journal of Multimedia Information Retrieval, 2018
An implementation of the cca layer is contained in this repository and also available here.
For a list of required python packages see the requirements.txt or just install them all at once using pip.
pip install -r requirements.txt
We also provide an anaconda environment file which can be installed as follows:
conda env create -f environment.yaml
To install the audio_sheet_retrieval package in develop mode (this is what we recommend) run
python setup.py develop --user
in the root folder of the package.
You will also need the MSMD dataset python package available at your system in order to be able to load the data (see below how you get it).
Almost all of our experiments are based on the proposed Mulitmodal Sheet Music Data Set (MSMD). For a detailed description of the MSMD data and how to get and load it please visit our data set repository. The only set of experiments not covered in this repository are the ones carried out on commercially licenced sheet music. However, all our models are trained exclusively on MSMD.
Before you can start running the code make sure that all paths are configured correctly. In particular, you have to specify two paths in the file audio_sheet_retrieval/config/settings.py:
# path where model folder gets created and parameters and results get dumped
EXP_ROOT = "/home/matthias/experiments/audio_sheet_retrieval/"
# path where to find the data (In our case the root directory of the MSMD dataset)
DATA_ROOT_MSMD = '/media/matthias/Data/msmd/'
Once this is done we can start training and evaluating our retrieval models.
If you just want to apply our models to your own sheet music our audios check out or tutorials. So far we provide the following tutorials as ipython notebooks:
- Embedding Tutorial
- Embedding Tutorial Audio-to-Audio
The python script run_train.py allows you to train all individual retrieval models. Alternatively, if you would like to train all models of one split at once you can use the additional shell script train_models.sh:
./train_models.sh cuda0 models/mutopia_ccal_cont.py <path-to-sheet-manger>/sheet_manager/sheet_manager/splits/all_split.yaml
# $1 ... the device to train on
# $2 ... the model to train
# $3 ... the train split (data) to use for training
If you do not want to reproduce all our results reported in the paper but train only the best performing model you can do this with the following command:
python run_train.py --model models/mutopia_ccal_cont.py --data mutopia --train_split <path-to-sheet-manger>/sheet_manager/sheet_manager/splits/all_split.yaml --config exp_configs/mutopia_full_aug.yaml
This command trains a model on the all-split (containing pieces of all different composers)
in the full data augmentation setting (sheet music and audio augmentation).
Once this is done there is a final step missing.
As the CCA-Projection-Layer is based on the internal statistics of the training batches
we fine tune it with a very large batch (here 25000 samples) to push the model
to its best performance:
python refine_cca.py --n_train 25000 --model models/mutopia_ccal_cont.py --data mutopia --train_split <path-to-sheet-manger>/sheet_manager/sheet_manager/splits/all_split.yaml --config exp_configs/mutopia_full_aug.yaml
After this step you should end up with fairly well performing mutimodal audio-sheet music encoders.
The code structure of the evaluation part of the repository is in line with the training functionality. To evaluate all models of a certain split at once simply call:
./eval_models.sh cuda0 models/mutopia_ccal_cont_rsz.py <path-to-sheet-manger>/sheet_manager/sheet_manager/splits/all_split.yaml
# $1 ... the device to evaluate on
# $2 ... the model to train
# $3 ... the train split (data) to use for evaluation
All results will be printed to your command line output
and in addition dumped to the model folder in your <EXP_ROOT> directory (if the flag --dump_results is active).
Again, you can also evaluate the models individually:
python run_eval.py --dump_results --model models/mutopia_ccal_cont.py --data mutopia --train_split <path-to-sheet-manger>/sheet_manager/sheet_manager/splits/all_split.yaml --config exp_configs/mutopia_full_aug.yaml --estimate_UV --n_test 2000
By adding the flag
# (audio-query - to - sheet music)
--V2_to_V1
you can change the retrieval direction to audio-query - to - sheet music. If the flag is not present we retrieve audio (spectrogram excerpts) from image queries by default.
To reproduce our experiments on score and performance identification you can run the following shell script.
./eval_piece_retrieval.sh cuda0 models/mutopia_ccal_cont_rsz.py <path-to-sheet-manger>/sheet_manager/sheet_manager/splits/all_split.yaml
# $1 ... the device to evaluate on
# $2 ... the model to train
# $3 ... the train split (data) to use for evaluation
As above, you can also run these experiments individually by calling:
python audio_sheet_server.py --model models/mutopia_ccal_cont.py --full_eval --init_sheet_db --estimate_UV --dump_results --train_split <path-to-sheet-manger>/sheet_manager/sheet_manager/splits/all_split.yaml --config --config exp_configs/mutopia_full_aug.yaml
for score identification from audio recordings and
python sheet_audio_server.py --model models/mutopia_ccal_cont.py --full_eval --init_audio_db --estimate_UV --dump_results --train_split <path-to-sheet-manger>/sheet_manager/sheet_manager/splits/all_split.yaml --config --config exp_configs/mutopia_full_aug.yaml
for finding performances given a certain score as a query. The flags
--init_sheet_db
--init_audio_db
indicate weather to create a database of sheet snippets or audio excerpts or to load a precomputed database from the disk.