Minhash-AI-Authorship-Detection

Overview

Distinguishing text produced by Large Language Models (LLMs) from human-produced texts is challenging due to LLMs’ abilities to generalize information and present it in new contexts. Our research focuses on identifying baseline similarities for LLM-generated texts by using various MinHash techniques and comparing them to human outputs. We aim to uncover a baseline similarity score for AI-generated content, which could help classify future inputs as either AI or human-generated.

Objectives

Identify baseline similarities for LLM-generated texts.
Explore and evaluate different MinHash techniques:
- Basic MinHashing
- K-shingling MinHashing
- SimHashing
- Cosine similarities
Establish a baseline similarity score for AI-generated content.
Classify future inputs (AI or human-generated) based on the baseline similarity score.

Methodology

Dataset

We use a dataset containing titles and abstracts of research papers from Arxiv. Each title appears twice: once with the real abstract and once with an abstract generated by GPT-3. The dataset is divided into:

Human-generated abstracts for testing.
Machine-generated abstracts split into 80% for training and 20% for testing.

Techniques

Basic MinHashing: Compute similarity based on Jaccard similarity.
K-shingling MinHashing: Efficient approximation of Jaccard similarity by hashing overlapping or non-overlapping sequences of tokens.
SimHashing: Create a fixed-size fingerprint for each document to preserve similarity in the hashed space.
Cosine Similarities: Compare differences using cosine similarities instead of Jaccard similarities.

Hyperparameters

Hash Functions: 128 hash functions to generate minHashes.
Shingles and N-grams: Adjust the number of shingles for K-shingling MinHashing and explore different N-grams.

Experiments

Establish a baseline similarity score from pairwise comparisons of machine-generated abstracts in the training set.
Compare Jaccard/Hamming Distance similarities for human-generated abstracts with machine-generated abstracts.
Adjust classification thresholds to optimize accuracy.

Results

Optimal Parameters: 4-shingle MinHash using BertTokenizer achieved the highest classification accuracy for the testing set.
Graphical Analysis: Generated graphs for each parameter to determine optimal settings for K-shingle and N-grams.
Case Study: Successfully classified human and AI-generated essays using prompt-based essays with a simple model and prompt engineering.

Conclusion

Our study demonstrates the effectiveness of various MinHashing techniques in distinguishing AI-generated content from human-generated content. By identifying key parameters and evaluating different methods, we achieved consistent and accurate detection of AI-generated abstracts. Our findings support the viability of similarity classification through MinHash in detecting AI-generated content.

Future Work

We are exploring the viability of developing a system to classify essays using a training set generated through prompt engineering. This system could serve as a commercial application of our methodology.

References

Fröhling, Leon, and Arkaitz Zubiaga. “Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover.” PeerJ. Computer science vol. 7 e443. 6 Apr. 2021, doi:10.7717/peerj-cs.443
Kirchenbauer, John, et al. “A Watermark for Large Language Models.” arXiv [Cs.LG], 2023, http://arxiv.org/abs/2301.10226. arXiv.

Yassin-MT/Minhash-AI-Authorship-Detection