Real-time Voice Phishing(Lie) Classifier using Echo State Networks
All code was written in Python>=3.7.
To download the libraries used in this project, enter the following command:
!pip install -r requirement.txt
1. Labeling
For Speaker Diarization, we utilized a pretrained model provided by the Pyannote library.
- The voices of the scam callers(voice phishing scammers) were labeled as 1,
- And the voices of the recipients were labeled as 0.
2. Augmentation
We tried augmentation method to expand the amount of data.
Time strech, pitch shift and adding noise were used to augmetation.
3. Generation
To deal with the lack of data despite augmentation, we used generative AI for producing audio data which have biological features similar to the original data.
We conducted a data generation experiment using the two models below:
MVGAN : Audio-to-Audio GAN using Mel-spectrogram Generator and HiFiGAN Vocoder (made by Do-Hyeon Lim)
- MFCC(total 20 of feature vectors)
- Pitch
- F0(Fundamental Frequency)
- Spectral Flux
- Spectral Frequency
Classifier : Echo State Network
- A specific kind of recurrent neural network (RNN) designed to efficiently handle sequential data based on Reservoir Computing.
TBA (optimizing)
- Accuracy
- F1 Score
[1]https://doi.org/10.48550/arXiv.1712.04323 (Github : https://github.com/stefanonardo/pytorch-esn)
[2]https://doi.org/10.48550/arXiv.2010.05646 (Github : https://github.com/jik876/hifi-gan)