- Feature Extraction
- Data preparation
- Defining the model
- Training & evaluating the model
- Testing the model
MFCCs are extracted from raw speech signals using extract_mfcc.py
.
The library for feature extraction is python_speech_features
.
The path to the training data is given in the main function (data_path).
The extracted features(MFCC/Spectrogram) are loaded using unnorm_load_mfcc.py
script;
- By defining the size of the parameter "frm", the size of the input data is defined.
- Parameter "step" defines the step size of the segmentation window. If the step size < frame size --> the segments are overlapping, else: there is no overlap between data samples. _
ftdnn_libri.py
: Creates FTDNN, trains and evaluates the model
unnorm_load_mfcc.py ----> Load and prepare data models.py ---> Contains layers of FTDNN
The best models during training are saved in the "checkpoints" directory.
The achieved accuracy with FTDNN is 87.2% on 1-second test segments, and 96.6% on 3-second utterances.
Trained models are saved in 'checkpoints' folder.
├──Speaker-identification-FTDNN
├── inputs # Datasets & features │ ├── audio_files │ ├── MFCC
├── data loader │ ├── unnorm_load_mfcc.py
├── models
│ ├── models.py
│ ├── ftdnn_libri.py
├── outputs │ ├── checkpoints
├── README.md ├── requirements.txt # speaker-identification