Speaker Identification
Dataset
The dataset for this project is taken from kaggle: https://www.kaggle.com/datasets/vjcalling/speaker-recognition-audio-dataset
50 speakers audio data with length more than 1 hour for each. Further, data converted to wav format, 16KHz, mono channel and is split into 1min chunks. This dataset can be used for speaker recognition kind of problems. This dataset was scraped from YouTube and Librivox.
Code
siarec
folder contains all the training code.
Used Siamese Network to solve this problem.
Currenly using only 3 speaker audio spectrogram images and training a Siamese Network using Contrastive loss.
For testing, it uses the weights of the model and predicts the output label of a speaker for a single input.
preprocess.ipynb
is a notebook for converting audio (.wav) files to spectrogram images and saving them.
Used only 3 speaker information (out of 50) for training our model.
Example code run
python main.py --epoch=3 --batch_size=16 --learning_rate=0.01 --model='path-to-trained-model'
--model
argument when set will perform testing on the test set.
For training, don't use --model
argument.