Minhash-AI-Authorship-Detection

Overview

Distinguishing text produced by Large Language Models (LLMs) from human-produced texts is challenging due to LLMs’ abilities to generalize information and present it in new contexts. Our research focuses on identifying baseline similarities for LLM-generated texts by using various MinHash techniques and comparing them to human outputs. We aim to uncover a baseline similarity score for AI-generated content, which could help classify future inputs as either AI or human-generated.

Objectives

  1. Identify baseline similarities for LLM-generated texts.
  2. Explore and evaluate different MinHash techniques:
    • Basic MinHashing
    • K-shingling MinHashing
    • SimHashing
    • Cosine similarities
  3. Establish a baseline similarity score for AI-generated content.
  4. Classify future inputs (AI or human-generated) based on the baseline similarity score.

Methodology

Dataset

We use a dataset containing titles and abstracts of research papers from Arxiv. Each title appears twice: once with the real abstract and once with an abstract generated by GPT-3. The dataset is divided into:

  • Human-generated abstracts for testing.
  • Machine-generated abstracts split into 80% for training and 20% for testing.

Techniques

  1. Basic MinHashing: Compute similarity based on Jaccard similarity.
  2. K-shingling MinHashing: Efficient approximation of Jaccard similarity by hashing overlapping or non-overlapping sequences of tokens.
  3. SimHashing: Create a fixed-size fingerprint for each document to preserve similarity in the hashed space.
  4. Cosine Similarities: Compare differences using cosine similarities instead of Jaccard similarities.

Hyperparameters

  • Hash Functions: 128 hash functions to generate minHashes.
  • Shingles and N-grams: Adjust the number of shingles for K-shingling MinHashing and explore different N-grams.

Experiments

  • Establish a baseline similarity score from pairwise comparisons of machine-generated abstracts in the training set.
  • Compare Jaccard/Hamming Distance similarities for human-generated abstracts with machine-generated abstracts.
  • Adjust classification thresholds to optimize accuracy.

Results

  • Optimal Parameters: 4-shingle MinHash using BertTokenizer achieved the highest classification accuracy for the testing set.
  • Graphical Analysis: Generated graphs for each parameter to determine optimal settings for K-shingle and N-grams.
  • Case Study: Successfully classified human and AI-generated essays using prompt-based essays with a simple model and prompt engineering.

Conclusion

Our study demonstrates the effectiveness of various MinHashing techniques in distinguishing AI-generated content from human-generated content. By identifying key parameters and evaluating different methods, we achieved consistent and accurate detection of AI-generated abstracts. Our findings support the viability of similarity classification through MinHash in detecting AI-generated content.

Future Work

We are exploring the viability of developing a system to classify essays using a training set generated through prompt engineering. This system could serve as a commercial application of our methodology.

References

  • Fröhling, Leon, and Arkaitz Zubiaga. “Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover.” PeerJ. Computer science vol. 7 e443. 6 Apr. 2021, doi:10.7717/peerj-cs.443
  • Kirchenbauer, John, et al. “A Watermark for Large Language Models.” arXiv [Cs.LG], 2023, http://arxiv.org/abs/2301.10226. arXiv.