Data-driven Harmonic Filters for Audio Representation Learning

For more readable code, please check this repository.

Reference

Data-driven Harmonic Filters for Audio Representation Learning, ICASSP 2020 [pdf]

-- Minz Won, Sanghyuk Chun, Oriol Nieto, and Xavier Serra

TL;DR

We introduce a stacked band-pass filters. Filters are stacked through channels and their center frequencies are in harmonic relationship, e.g., If the k-th filter in the first channel has a center frequency of 440Hz, k-th filter in the second channel is automatically 880Hz, and the k-th filter in third channel is 1320Hz.
Center frequencies and bandwidths are learnable.
Then we simply applied 3x3 CNN.
It showed SOTA performances in music tagging, keyword spotting, and acoustic event detection tasks.

Citation

@inproceedings{won2020data,
  title={Data-driven harmonic filters for audio representation learning},
  author={Won, Minz and Chun, Sanghyuk and Nieto, Oriol and Serra, Xavier},
  booktitle={Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={536--540},
  year={2020},
  organization={IEEE}
}

minzwon/data-driven-harmonic-filters

Data-driven Harmonic Filters for Audio Representation Learning

Reference

Citation