With the advancement in technologies, the number of cybercrimes is also increasing day by day. In the recent news published by Wallstreet, fraudsters successfully used AI-based software to mimic the voice of a CEO to get money transferred into their accounts. Many software tools are available which are capable of synthesizing the human voice and use it to breach the security of a system, use over the phone for getting user’s delicate data or can be applied in the fake speech of powerful people such as politicians.
DeepFake audio can be categorized into three major categories synthetic voice, converted speech and replayed speech. It is very difficult to detect these audio and separate them from real audio. In this research, we propose to build a Region-based Convolutional Neural Network (R-CNN) model to identify DeepFake audio. R-CNN is used to extract features from the proposed regions of an input file which are then used for object detection or classification.
Video Presentation: Youtube Video link for the presentation https://youtu.be/OK32PdY5P4k
ML2_ASV: Codes for the implementation of the project.
Deepfake Audio Detection: The research paper which includes all the study done to develop this project.