Keras implementation of Deep Clustering paper

This is a keras implementation of the Deep Clustering algorithm described at https://arxiv.org/abs/1508.04306. It is not yet finished. Most of this code was implemented by Valter Akira Miasato Filho.

Requirements

  1. System library:
  • libsndfile1 (installed via apt-get on Ubuntu 16.04)
  1. Python packages (I used Anaconda and Python 3.5):
  • Theano (pip install git+git://github.com/Theano/Theano.git)
  • keras (pip install keras)
  • pysoundfile (pip install pysoundfile)
  • numpy (conda install numpy)
  • scikit-learn (conda install scikit-learn)
  • matplotlib (conda install matplotlib) (only used for visualization)

Training the network

First of all, you must create two text files: train_list and valid_list. They must contain your training and validation data. The lines of these files must be according to the following pattern:

path/to/audioFile1 spk1
path/to/audioFile2 spk2
path/to/audioFile3 spk1

spk1, spk2 identifies the speaker that uttered the recorded sentence.

The current implementation should work with any sample rate, but experiments were conducted only with 8kHz audio. It was already tested with flac and wav files, but it should work with all formats supported by pysoundfile/libsndfile.

After creating train_list and valid_list, you may start training the network with the command:

python main.py

Please check the main script if you wish to use other features from this project, such as output visualization and prediciton.

As of February, 2017, this project is halted, but we are still open to feedback and questions.

References