This tutorial contains a sequence of jupyter notebook files that help you generate latent space representations from input audio files, evaluate them and generate an interactive visualization.
Keep the directory structure the way it is and put your data in the 'audio' and 'data' folder. Do not change the folder structure or location of notebooks or function files!
├── notebooks <- contains analysis scripts
│ ├── 01_generate_spectrograms.ipynb
│ ├── ...
│ └── ...
├── audio <- ! put your input soundfiles in this folder or unzip the provided example audio!
│ ├── call_1.wav
│ ├── call_2.wav
│ └── ...
├── functions <- contains functions that will be called in analysis scripts
│ ├── audio_functions.py
│ ├── ...
│ └── ...
├── data <- ! put a .csv metadata file of your input in this folder or use the provided example csv!
│ └── info_file.csv
├── parameters
│ └── spec_params.py <- this file contains parameters for spectrogramming (fft_win, fft_hop...)
├── environments
│ └── umap_tut_env.yaml <- conda environment file (linux)
├── ...
Python>=3.8. is recommended. I would recommend to install the packages manually, but a conda environment file is also included in /environments (created on Linux! Dependencies may differ for other OS!).
For manual install, these are the core packages:
umap-learn
librosa
ipywidgets
pandas=1.2.4
seaborn
pysoundfile=0.10.3
voila
hdbscan
plotly
graphviz
networkx
pygraphviz
Make sure to enable jupyter widgets with:
jupyter nbextension enable --py widgetsnbextension
NOTE: Graphviz, networkx and pygraphviz are only required for one plot, so if you fail to install them, you can still run 99 % of the code.
If you haven't worked with Python and/or conda (a package manager), an easy way to get started is to install anaconda or miniconda (only the basic/core parts of anaconda) first:
After successful installation, create and activate your environment with conda:
conda create --name my_env
conda activate my_env
Then, install the required core packages:
conda install -c conda-forge umap-learn
conda install -c conda-forge librosa
conda install ipywidgets
conda install pandas=1.2.4
conda install seaborn
conda install -c conda-forge pysoundfile=0.10.3
conda install -c conda-forge voila
conda install -c anaconda graphviz
conda install -c conda-forge hdbscan
conda install -c plotly plotly
conda install networkx
conda install -c conda-forge pygraphviz
Finally, enable ipywidgets in jupyter notebook
jupyter nbextension enable --py widgetsnbextension
Clone this repository or download as zip and unpack. Make sure to have the same structure of subdirectories as described in section "Structure" and prepare your input files as described in section "Input requirements".
Start jupyter notebook with
jupyter notebook
and select the first jupyter notebook file to start your analysis (see section "Where to start").
All audio input files need to be in a subfolder /audio. This folder should not contain any other files.
To use the provided example data of meerkat calls, please unzip the file 'audio_please_unzip.zip' and verify that all audio files have been unpacked into an /audio folder according to the structure described in Section1.
To use your own data, create a subfolder "/audio" and put your sound files there (make sure that the /audio folder contains only your input files, nothing else). Each sound file should contain a single vocalization or syllable. (You may have to detect and extract such vocal elements first, if working with acoustic recordings.)
Ideally, start and end of the sound file correspond exactly to start and end of the vocalization. If there are delays in the onset of the vocalizations, these should be the same for all sound files. Otherwise, vocalizations may appear dissimilar or distant in latent space simply because their onset times are different. If it is not possible to mark the start times correctly, use the timeshift option to generate UMAP embeddings, but note that it comes at the cost of increased computation time.
Use the provided info_file.csv file for the example audio data or, if you are using your own data,´ add a ";"-separated info_file.csv file with headers containing the filenames of the input audio, some labels and any other additional metadata (if available) in the subfolder "/data". If some or all labels are unknown, there should still be a label column and unkown labels should be marked with "unknown".
Structure of info_file.csv must be:
| filename | label | ... | ....
-----------------------------------------
| call_1.wav | alarm | ... | ....
| call_2.wav | contact | ... | ....
| ... | ... | ... | ....
If you don't provide an info_file.csv, a default one will be generated, containing ALL files that are found in /audio and with all vocalizations labelled as "unknown".
-
Start with 01_generate_spectrograms.ipynb to generate spectrograms from input audio files.
-
Generate latent space representations with 02a_generate_UMAP_basic.ipynb OR 02b_generate_UMAP_timeshift.ipynb
-
You can now
-
Evaluate the latent space representation with 03_UMAP_eval.ipynb,
-
Visualize the latent space representation by running 03_UMAP_viz_part_1_prep.ipynb and 03_UMAP_viz_part_2_tool.ipynb or
-
Apply clustering on the latent space representation with 03_UMAP_clustering.ipynb
All code is under MIT-license. Exclusive copyright applies to the audio data file (audio_please_unzip.zip), meaning that you cannot reproduce, distribute or create derivative works from it. You may use this data to test the provided code, but not for any other purposes. If you are interested in using the exemplary data beyond the sole purpose of testing the provided code, please get touch with Prof. Marta Manser. See license for details.