IMPORTANT This repository is still under construction! will be officially "usable" soon. Until then, you can freely check out the codebase. IMPORTANT
This repository is a companion material associated with the paper, "the power of deep without going deep? a study of HDPGMM music representation learning". In particular, it contains the experimental routine that is used for generating the data and analysis assistant scripts (in R, see analysis
directory). The main algorithm implementation can be found in the separate repository.
Below we describe steps that allows re-produce the experimental routine, except a few steps (i.e., fetching audio samples of Million Song Dataset)
Here we discuss the procedure for setting up python (virtual) environment. Specifically, we use conda
and environment.yaml
. If conda
command is available already, the python virtual environment can be setup by following command.
conda env create -f environment.yaml
If conda
or miniconda
is not available yet, the entire setup process (including setting up the virtual environment) can be done locally by following command.
sh install_env.sh
This installation will install separate miniconda
instance inside this project directory, thus does not technically interfere each other if there is a system-wide miniconda
preinstalled. But activating the specific conda environment embedded in this project requires custome command.
export project_path=/path/to/project/hdpgmm-music-experiments/
source ${project_path}/miniconda3/bin/activate hdpgmm_music_experiment
(this command is executed at the end of the installation script)
It seems there is a bug in python on macOS when importing the soundfile
package. It seems to look up libsndfile.dylib
on the wrong places (at least on the tested machine of mine). It can be temporarily resolve by export right location of the libsndfile.dylib
:
brew install libsndfile
export DYLD_LIBRARY_PATH="/opt/homebrew/lib:$DYLD_LIBRARY_PATH"; python scripts/some_script.py
It may not be case that the location is under /opt/homebrew/lib
. The location can be easily identified and assigned by following command:
libsndfile_loc=$(brew list libsndfile | grep libsndfile.dylib | xargs dirname)
export DYLD_LIBRARY_PATH="${libsndfile_loc}:$DYLD_LIBRARY_PATH"
python scripts/some_script_loads_soundfile.py
There probably are other hacks such as installing the library through conda
, (i.e., libsndfile), but haven't tested yet.
We provide the relevant data files here. With total size of ~140GB, the package includes: 1) datasets preprocessed with selected audio features, 2) pre-trained models (VQCodebook, HDPGMM, KIM, CLMR), and finally 3) representations extracted from these models.
For the detailed description of the data files, please check the data repository linked above.
Download procedure of downstream datasets are relatively easy, as they are either well maintained by their original repository (i.e., MTAT, Echonest-MSD) or alternative sources to downloads. (i.e., GTZAN)
Once they're downloaded, follow the below preprocessing steps to conduct the study.
This subsection discusses the pre-processing routine of datasets.
TBD
python -m src.scripts.preprocessing gtzan \
-p /output/path/ \
--verbose \
/root/path/where/gtzan/is/extracted/
gtzan_path
needs to be set as the top level directory contains the audio files. (i.e., genres
directory) If there is no specific parameters set, it extracts the "feature" with default setup where n_fft=2048
, hop_s=512
, and mel_len=128
.
python -m src.scripts.preprocessing mtat \
-p /output/path/ \
--verbose \
/root/path/where/mtat/is/extracted/
similarly to the gtzan
script, it assumes a particular file-tree as follows:
-- mtat_path/
|-- audio/
| |-- 0/
| |-- 1/
| |.../
| |-- f/
|-- annotations_final.csv
(If the data is downloaded by the way above, it automatically organized in such a way.) By default it extracts features
with same setup as mentioned above.
python -m src.scripts.preprocessing echonest \
-p /output/path/ \
--verbose \
/MSD/mp3/root/path/ \
/MSD/path/info.pkl \
/MSD/song/2/track/map.pkl \
/where/echonest/stored/train.txt
In addition to regular arguements, it requires additional environment variable specifying the path of the audio file from MSD dataset. This downstream dataset uses a subset of MSD dataset, thus it can make the process more efficient. On top of that, user needs to provide a couple of pre-processed metadata files (i.e., msd_path_info
, msd_song2track
) which specifically bound to the dataset collection of our own. Thus this command might not work with other data collection scheme, similarly to the MSD training pre-processing step. Finally, providing echonest_triplet
and out_path
will suffice the script to run the preprocess. It by default extracts audio features and hyper-parameters are same as above.
Training the representation differs per model. For instance, the routine for fitting vqcodebook
models can be invoked by following command.
python -m src.scripts.fit_vqcodebook \
-k 256 \
--verbose \
/dataset/path/dataset.h5 \
/out/path/ \
"out_fn_prefix"
On the other hand, hdpgmm
models can be trained by following command.
python -m src.scripts.fit_hdpgmm \
--device "cpu" \
--no-augmentation \
--verbose \
/path/to/config.json
For the two deep learning based representations, we mostly depend on the helper scripts/CLI interface which is provided from their original repositories. For instance, Kim et al. (2020)
model can be trained by following command.
mtltrain /path/to/config.json
the example configuration file can be found at here. This routine also requires specific data pre-processing, which can be done by this script.
Finally, the model by Spijkervet and Burgoyne (2021)
can be trained by running following script after cloning our own clone (and mod) of the original CLMR
.
clone https://github.com/eldrin/CLMR.git
cd CLMR
git checkout hdpgmm_study_mod
python main.py --dataset audio --dataset_dir ./directory_containing_audio_files
For the "extraction" of the representation, we provide a CLI interface.
python -m src.scripts.feature_extraction
usage: extractfeat [-h] [--model-path MODEL_PATH] [--split-path SPLIT_PATH] [--device DEVICE]
[-m BATCH_SIZE] [-j N_JOBS] [--verbose | --no-verbose]
{vqcodebook,hdpgmm,G1,precomputed} dataset_path {mtat,gtzan,echonest} out_path
extractfeat: error: the following arguments are required: model_class, dataset_path, dataset, out_path
The deep learning based representation we introduced above, again, cannot be directly called by the script to extract representation. For instance, representation from Kim
model can be extracted by calling a extraction help tool installed from the dependency of this repository.
mtlextract --device "cpu" model_checkpoints.txt target_audios.txt /out/root/
The 'model checkpoints' text file lists the model checkpoint files trained from mtltrain
command, and 'target_audios.txt' contains the filename of each audio excerpt to be processed per line.
Finally, the CLMR
representation can be extracted by following script:
# we assume you are at root directory of cloned `CLMR` as above.
python dev/feature_extraction_dev.py /path/to/config.yaml /path/to/checkpoint.ckpt
We provide the example config files per downstream dataset we experimented under the configs/clmr_feature_ext_configs/
. After converting the audio files into wave format and placing in a directory (which can have the sub-directories for better organization), calling the script with cofiguration file will extract the representation from the pre-trained checkpoint file.
Once the representation is extracted with respect to the downstream task, it is possible to proceed to cunduct the task specific machine learning test. It can be done by colling another helper module within the package.
python -m src.scripts.test_downstream
usage: test_downstream [-h] {echonest,gtzan,mtat} ...
test_downstream: error: the following arguments are required: command
By selecting specific downstream task
python -m src.scripts.test_downstream gtzan \
-p /test/output/path/ \
--random-seed 2022 \
--verbose \
/path/to/precomputed_features_per_task.npz \
'precomputed' \
/path/to/dataset.h5 \
/path/to/split_file.txt
It supports extracting features directly from a subset of representation learners such as hdpgmm
or vqcodebook
if pre-trained model file is available. However, it is only provided for convenience and completeness, but not the most computationally efficient way.
Thus we recommend to almost always go with precomputed
option with representation precomputed by the above block. It would be general way to also host any representation computed from any other models, for instance, the deep representation learner we test in this repo (eg. KIM and CLMR)
Jaehun Kim (jaehun.j.kim@gmail.com)
Please leave issues (and perhaps send pull requests)! We don't have yet any format for issues (perhaps TBD), so feel free to take any form for now. Currently I am occupied by several projects and some other personal circumstances so might not guarantee to jump right on the issues, I will try to address them as much as possible.
TBD
@inproceedings{jaehun_kim_2022_7316610,
author = {Jaehun Kim and
Cynthia C. S. Liem},
title = {{The power of deep without going deep? A study of
HDPGMM music representation learning}},
booktitle = {{Proceedings of the 23rd International Society for
Music Information Retrieval Conference}},
year = 2022,
pages = {116-124},
publisher = {ISMIR},
address = {Bengaluru, India},
month = dec,
venue = {Bengaluru, India},
doi = {10.5281/zenodo.7316610},
url = {https://doi.org/10.5281/zenodo.7316610}
}
A part of this work was carried out on the Dutch national e-infrastructure with the support of the SURF Cooperative.