Numeral Prediction from short audio segment
In this project I have built a model that predicts the numerals of a short audio segment. The audio dataset was prepared by the batch of MSc Big Data Science at QMUL.
The trainingMLEND.csv consits of 20k rows and 4 columns. Each row corresponds to one of the items in our dataset, and each item is described by four attributes.
File ID (audio file)
Numeral
Participand ID
Intonation
Here I have extracted many features from the audio signal namely
Power
Pitch mean
Pitch standard deviation
Fraction of voiced region using librosa library.
I build 6 models which are
SVM
RandomForest
KNN
Naive Bayes
Logistic Regression
MuliLayer Perceptron Classifier
Nueral Network using Keras Sequential API
I build these models and trained them on hyperparameters, tuning them with the help of GridSearch CV. I build these models for both normalised predictors and without normalised also and with extra features such as mfcc and experimented them.
I build these models for predicting all the numerals which were Ones Sequence (0-9), Teens Sequence(10-19), Large Sequence ( twenty, thirty, ...., hundred, thousand, million, billion).
But the accuracy for predicitng all the numerals were very low so I tried to find a different approach and experimented with predicting only the ones sequence and finally, I achieved the highest Accuracy of 35% for the validation data using neural networks using Keras Sequential API.
It can be seen that there is a minimal change in accuracy over the 100 epochs for both training and validation.