This is a Tensorflow code repository accompanying the following paper:
@inproceedings{KrauseWM23_CrossVersionRepresentationLearning_ISMIR,
author = {Michael Krause and Christof Wei{\ss} and Meinard M{\"u}ller},
title = {A Cross-Version Approach to Audio Representation Learning for Orchestral Music},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
pages = {XXX--XXX},
address = {Milano, Italy},
year = {2023}
}
This repository contains code and trained models for the paper's experiments. The annotations used in the paper are available on the project website. For details and references, please see the paper.
cd cross_version_learning
conda env create -f environment.yml
conda activate cross_version_learning
Extract the dataset in a data
subdirectory of this repository. You will need to obtain the audio files and correctly name them according to the names of the annotation files. See the dataset website for details. Furthermore, extract the trained models from the project website in the outputs/models
subdirectory.
Run scripts using, e.g., the following commands:
export CUDA_VISIBLE_DEVICES=0
python 02_extract_embeddings.py CV
where an additional parameter CV
is submitted here to extract embeddings for the CV
model.
The individual scripts perform the following steps:
01_train_model.py
: Train a representation model from scratch. This will overwrite the stored model checkpoints. Submit eitherCV
orSV
as argument to this script to train the corresponding representation learning method.02_extract_embeddings.py
: Extract embeddings from recordings using an already trained model. Submit eitherCV
,SV
orSup
to extract embeddings for the corresponding model.03_ssm_evaluation_quantitative.py
: Reproduce Figures 4 and 5 from the paper. Here, learned representations are evaluated by computing self-similarity matrices (SSM) based on them and then comparing their structural boundaries with those from reference matrices. Chroma and MFCC features are used as baselines here. The resulting plots are found inoutputs/ssm_boundaries
.04_probing_evaluation.py
: Perform probing evaluation as in Section 5.4/Tables 2 and 3 in the paper. Submit two parameters. First, eitherCV
,SV
,Sup
,Chroma
orMFCC
(corresponding to the type of feature to be evaluated). Second, eitherInst
orPitchClass
corresponding to the target task for probing. So callingpython 04_probing_evaluation.py CV Inst
will evaluate the proposed CV representations for instrument classification. The evaluation results are then found inoutputs/probing
.
Note that there may be minor differences in results compared to what is reported in the paper due to random effects of training, newer versions of the packages being used here (e.g. librosa), and slight changes to the code.