/1MaxCNN

1-max pooling CNN for robust audio event recognition

Primary LanguageMATLAB

This experiment setup for the published paper: Huy Phan, Lars Hertel, Marco Maass, and Alfred Mertins. Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks. In Proceedings of 17th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3653-3657, 2016

The setup is similar to that in

I. McLoughlin, H.-M. Zhang, Z.-P. Xie, Y. Song, W. Xiao, Robust Sound Event Classification using Deep Neural Networks, IEEE Trans. Audio, Speech and Language Processing, Jan 2015

(which can be found here http://www.lintech.org/machine_hearing/index.html)

The implementation of 1-max pooling CNN is based on the implementation of Denny Britz:

http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/

Although implementation with variable length is possible, we padded zeros each input to the maximum length to ease the implementation.These zeros paddings actually do not bring significant affects on the 1-max pooling.

The experiments can be run as follows:

On MATLAB: (for SIF feature extraction)

create_database;
extract_sif;
extract_sif_plus;
export_data_tensorflow;
export_data_tensorflow_plus;
export_data_multicondition_tensorflow;
export_data_multicondition_tensorflow_plus;

On Tensorflow: (for CNN training and evaluation)

bash run.sh