Lieon-ai

Real-time Voice Phishing(Lie) Classifier using Echo State Networks

Requirements

All code was written in Python>=3.7.

To download the libraries used in this project, enter the following command:

!pip install -r requirement.txt

Data

1. Labeling

For Speaker Diarization, we utilized a pretrained model provided by the Pyannote library.

The voices of the scam callers(voice phishing scammers) were labeled as 1,
And the voices of the recipients were labeled as 0.

2. Augmentation

We tried augmentation method to expand the amount of data.
Time strech, pitch shift and adding noise were used to augmetation.

~~3. Generation~~

To deal with the lack of data despite augmentation, we used generative AI for producing audio data which have biological features similar to the original data. We conducted a data generation experiment using the two models below:

~~AAGAN : Audio-to-Audio Generative Adversarial Networks (made by Do-Hyeon Lim)~~

~~MVGAN : Audio-to-Audio GAN using Mel-spectrogram Generator and HiFiGAN Vocoder (made by Do-Hyeon Lim)~~

Feature

MFCC(total 20 of feature vectors)
Pitch
F0(Fundamental Frequency)
Spectral Flux
Spectral Frequency

Model (ongoing)

Classifier : Echo State Network

A specific kind of recurrent neural network (RNN) designed to efficiently handle sequential data based on Reservoir Computing.

Evaulation (ongoing)

TBA (optimizing)

Accuracy
F1 Score

Reference

[1]https://doi.org/10.48550/arXiv.1712.04323 (Github : https://github.com/stefanonardo/pytorch-esn)
[2]https://doi.org/10.48550/arXiv.2010.05646 (Github : https://github.com/jik876/hifi-gan)