/TextTuring

TextTuring is a tool for distinguishing human-generated text from language model-generated text.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

TextTuring: Distinguishing Human from Machine

Overview

TextTuring is an innovative project designed to distinguish human-generated text from machine-generated text. It leverages state-of-the-art natural language processing (NLP) techniques and machine learning models to accomplish this objective. With the ever-increasing generation of AI-generated content, TextTuring offers a powerful solution to identify and verify human-authored text.

Inspiration

TextTuring's inspiration draws from catching chess cheaters who use AI engines to assist them during games. Cheaters can be caught if they use top engine lines, similar to how TextTuring identifies text that closely resembles AI-generated content.

Project Features

  • Data Collection: TextTuring provides a comprehensive dataset that includes a wide range of text samples. This dataset comprises both human-written and AI-generated content, ensuring diversity and accuracy in the model's training and evaluation.

  • Feature Engineering: The project incorporates advanced feature engineering techniques to analyze and extract meaningful characteristics from text data. These features include n-gram analysis and the computation of weak Language Model (LLM) scores.

  • Threshold Calculation: TextTuring dynamically calculates threshold values based on the provided data. This enables precise differentiation between human and machine-generated text.

  • Model Evaluation: The project employs various machine learning techniques to assess text samples against the calculated threshold. This evaluation process results in clear predictions, helping users determine the authenticity of the text.

  • Scalability: TextTuring is designed with scalability in mind, allowing it to efficiently process vast volumes of text data.

How to Use

Installation for Development

  1. Clone the repository

    git clone https://github.com/jaywyawhare/TextTuring
  2. Install the required packages

    pip install -r requirements.txt
  3. Generate the dataset

    python3 main.py --generate
  4. Decide the threshold

    python3 main.py --threshold
  5. Go through the juptyer notebook

  6. Deploy the web app

    streamlit run app.py

For Using the web app

Contributors

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • I extend my gratitude to the open-source NLP and machine learning communities for their invaluable contributions to the field.

No Need to check my readme as they are written by me because they arent! 😉