/GPT-Sentinel-public

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

Primary LanguageJupyter Notebook

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

This repository is no longer actively maintained. Please refer to our latest followup work:

Overview

📄 Link to Paper (arXiv) | 💾 Link to Dataset | 📦 Link to Checkpoint

This repository is the codebase for paper GPT-Sentinel: Distinguishing Human and ChatGPT Generating Content.

  1. We collect and publish OpenGPTText - a high quality dataset with approximately 30,000 text sample rephrased by gpt-3.5-turbo (ChatGPT).
  2. We construct two detectors with different architectures - the RoBERTa-Sentinel and T5-Sentinel.
  3. T5-Sentinel shows SOTA performance (98% accuracy) on OpenGPTText dataset

image