Speech Mode Classifier

SMC was created for my BA project, it is a web application employing neural networks in the back-end. The front-end part is here on Ubipariper's GitHub page.

This repository contains:

Scripts used to extract speech features from the dataset.
Script containing Voice Activation Detection on the audio files using a derrivative of the MACD method thats commonly used in stock trading.
Scripts for merging multiple files containing the data.
Scripts used for creating and training the NNs.

Ok but what is this code doing exactly?

For the feature extraction part, you run the preprocessor.py script, give it info about the samples you want to extract:

path to the directory containing the audio files you want to extract from
info about what class is the speech that is inside the directory- you basically have two main classes here:
- whisper
- normal speech
path for the output file to be put into

This part is where it gets interesting:

All the files go through the VAD module at first, this part detects where in the audio file there actually is speech (in any mode), and where is the silence). This allowed me to save quite a bit of time, not having to mark every moment in all the files, where something is being said. The vad.py script can be run on its own and it has a debug option, which can display a half-decently looking graph, that can visualize what is being classified.
After that, the featureExtraction.py part comes into play, employing code from pyAudioAnalyis library to actually extract the features- the feature list is very well described in this library's docs, so if you are interested I ecourage you to check it out.
The last step is to actually mark the samples and save them to the output file.

If you really want to train the networks, run the scripts on atleast two directories, containing samples of both normal and whispered speech

After that, run the datasetMerge.py script to join the .h5 files generated by the previous part.

blesniewski/smc

Speech Mode Classifier

This repository contains:

Ok but what is this code doing exactly?

For the feature extraction part, you run the preprocessor.py script, give it info about the samples you want to extract:

This part is where it gets interesting:

If you really want to train the networks, run the scripts on atleast two directories, containing samples of both normal and whispered speech

All this done, all is left to do is to actually use this data for training, examples of multiple Neural Networks are in the models directory.