Attacking Neural Text Detectors

The original dataset in "./data/" is 100% synthetic, generated by GPT-2. we are trying to see if they can be fooled as human written. Run to start experiments, here are some global constants...
EXPERIMENT_NAME is the name of the folder to hold the results files
ADVERSARIAL_TYPE is the type of changes we make to each text.
TEXT_TO_CHANGE is the number of texts to make adversarial.

Adversarial Types:
-'do-nothing': Nothing is done
-'replace-char': Replace homoglyphs below
-'random-order-replace-char': Same as replace char except the input text lines are shuffled
-'misspelling': Replaces certain words with misspellings from misspellings.json.


Code for "Attacking Neural Text Detectors" (

Run python to download the GPT-2 top k-40 neural text test set created by OpenAI. For more documentation regarding this and similar datasets, visit

OpenAI RoBERTa neural text detector can be downloaded by running wget

Install requirements via pip install -r requirements.txt.

Run python to run a sample experiment.