Deepfake Text Detection

Overview

This GitHub repository contains the code and resources for a project aimed at detecting deepfake text, specifically focusing on distinguishing between text generated by humans and artificial intelligence (AI) in both Arabic and English languages. The project employs advanced Natural Language Processing (NLP) techniques, starting from data collection, through the development of deep learning models, to the deployment using Streamlit.

Datasets

Arabic Dataset

It contain human data and the equivalent AI data from sorches such as: bard, and gpt-3.5, the data is diverse containg topics sabout politics, medicine, tech, sports, and others.

English Dataset

Similar to the Arabic dataset, it captures the nuances of various genres and writing styles as it includes text from different sources, such as news articles, social media posts, the AI data also geneated by gpt-3.5, gpt-3, and gpt-2.

Mixed Data

The mixed dataset is a combination of both Arabic and English text, reflecting the real-world scenario where a model might encounter multilingual content.

Models' Performance Comparison

Usage

To run this application, you'll need:

Python
PyTorch
GPU with CUDA support

pip install streamlit
pip install altair as alt
pip install joblib

To execute the StramLit Application use the following command:

streamlit run main.py

Ensure that you have the required dependencies installed before running the code.

Video demo

vlc-record-2024-01-30-18h35m31s-ai.detict.demo.mp4-.mp4

NasserMohamedEid/Text-AI-Detection