
1-max pooling CNN for robust audio event recognition

Primary LanguageMATLAB

This experiment setup for the published paper: Huy Phan, Lars Hertel, Marco Maass, and Alfred Mertins. Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks. In Proceedings of 17th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3653-3657, 2016

The setup is similar to that in

I. McLoughlin, H.-M. Zhang, Z.-P. Xie, Y. Song, W. Xiao, Robust Sound Event Classification using Deep Neural Networks, IEEE Trans. Audio, Speech and Language Processing, Jan 2015

(which can be found here http://www.lintech.org/machine_hearing/index.html)

The implementation of 1-max pooling CNN is based on the implementation of Denny Britz:


Although implementation with variable length is possible, we padded zeros each input to the maximum length to ease the implementation.These zeros paddings actually do not bring significant affects on the 1-max pooling.

The experiments can be run as follows:

On MATLAB: (for SIF feature extraction)


On Tensorflow: (for CNN training and evaluation)

bash run.sh