Deepfake Text Detection

Overview

This GitHub repository contains the code and resources for a project aimed at detecting deepfake text, specifically focusing on distinguishing between text generated by humans and artificial intelligence (AI) in both Arabic and English languages. The project employs advanced Natural Language Processing (NLP) techniques, starting from data collection, through the development of deep learning models, to the deployment using Streamlit. image

Datasets

Arabic Dataset

It contain human data and the equivalent AI data from sorches such as: bard, and gpt-3.5, the data is diverse containg topics sabout politics, medicine, tech, sports, and others.

English Dataset

Similar to the Arabic dataset, it captures the nuances of various genres and writing styles as it includes text from different sources, such as news articles, social media posts, the AI data also geneated by gpt-3.5, gpt-3, and gpt-2.

Mixed Data

The mixed dataset is a combination of both Arabic and English text, reflecting the real-world scenario where a model might encounter multilingual content.

Models' Performance Comparison

image

Usage

To run this application, you'll need:

  • Python
  • PyTorch
  • GPU with CUDA support
pip install streamlit
pip install altair as alt
pip install joblib

To execute the StramLit Application use the following command:

streamlit run main.py

Ensure that you have the required dependencies installed before running the code.

Video demo

vlc-record-2024-01-30-18h35m31s-ai.detict.demo.mp4-.mp4