This GitHub repository contains the code and resources for a project aimed at detecting deepfake text, specifically focusing on distinguishing between text generated by humans and artificial intelligence (AI) in both Arabic and English languages. The project employs advanced Natural Language Processing (NLP) techniques, starting from data collection, through the development of deep learning models, to the deployment using Streamlit.
It contain human data and the equivalent AI data from sorches such as: bard, and gpt-3.5, the data is diverse containg topics sabout politics, medicine, tech, sports, and others.
Similar to the Arabic dataset, it captures the nuances of various genres and writing styles as it includes text from different sources, such as news articles, social media posts, the AI data also geneated by gpt-3.5, gpt-3, and gpt-2.
The mixed dataset is a combination of both Arabic and English text, reflecting the real-world scenario where a model might encounter multilingual content.
To run this application, you'll need:
- Python
- PyTorch
- GPU with CUDA support
pip install streamlit
pip install altair as alt
pip install joblib
To execute the StramLit Application use the following command:
streamlit run main.py
Ensure that you have the required dependencies installed before running the code.