soundata

Python library for downloading, loading & working with sound datasets. Check the API documentation and the contributing instructions.
For Music Information Retrieval (MIR) datasets please check mirdata.

This library provides tools for working with common sound datasets, including tools for:

Downloading datasets to a common location and format
Validating that the files for a dataset are all present
Loading annotation files to a common format
Parsing clip-level metadata for detailed evaluations

Here's soundata's list of currently supported datasets.

Installation

To install, simply run:

pip install soundata

Quick example

import soundata

dataset = soundata.initialize('urbansound8k')
dataset.download()  # download the dataset
dataset.validate()  # validate that all the expected files are there

example_clip = dataset.choice_clip()  # choose a random example clip
print(example_clip)  # see the available data

See the documentation for more examples and the API reference.

Contributing a new dataset loader

We welcome and encourage contributions to this library, especially new dataset loaders. Please see contributing for guidelines. Feel free to open an issue if you have any doubt or your run into problems when working on the library.

Releases

The Soundata Zenodo repository is the preferred source for downloading the software releases.

Citing

If you use Soundata in your pipeline, please cite the version used with the corresponding DOI of the version release in Zenodo. For Soundata v1.0.1.:

If you refer to soundata's design principles, motivation etc., please cite the JOSS article:

@article{Fuentes2024,
	title        = {{Soundata: Reproducible use of audio datasets}},
	author       = {Fuentes, Magdalena and Plaja-Roglans, Genís and Cortès-Sebastià, Guillem and Khandelwal, Tanmay and Miron, Marius and Serra, Xavier and Bello, Juan Pablo and Salamon, Justin},
	year         = 2024,
	month        = jun,
	journal      = {Journal of Open Source Software},
	volume       = 9,
	number       = 98,
	pages        = 6634,
	doi          = {10.21105/joss.06634},
	url          = {https://joss.theoj.org/papers/10.21105/joss.06634}
}