/ConDA-gen-text-detection

Code for the paper: ConDA: Contrastive Domain Adaptation for AI-generated Text Detection

Primary LanguagePythonMIT LicenseMIT

ConDA-gen-text-detection

Code for the paper: ConDA: Contrastive Domain Adaptation for AI-generated Text Detection accepted at IJCNLP-AACL 2023 paper link.

🌟 Great News! [Nov 4, 2023] 🌟 Our paper won the Outstanding Paper Award at IJCNLP-AACL 2023 held in Bali, Indonesia.

ConDA Framework Diagram

Setup

Set up a separate environment and install requirements via pip install -r requirements.txt

Make directories for the models, output logs and huggingface model files.

mkdir models huggingface_repos output_logs

Download roberta-base from here and/or roberta-large from here and place these repositories in huggingface_repos.

contrast_training_with_da.py is the ConDA training script. The multi_domain_runner.py is the runner script for training ConDA models. Update the arguments in multi_domain_runner.py to train models as needed.

Use the evaluation.py script for evaluating models. Change arguments within the evaluation.py script as needed.

TuringBench

Link to the dataset website: link Link to the TuringBench paper: link

Files should be split into 3 jsonl splits: train, valid, test. Each line in the jsonl is a data instance with text and label fields.

Links to best performing models for each target generator

Here we provide links to pre-trained ConDA models for the best performing models:

Target Best performing source Dropbox Link
CTRL GROVER_mega link
FAIR_wmt19 GPT2_xl link
GPT2_xl FAIR_wmt19 link
GPT3 GROVER_mega link
GROVER_mega CTRL link
XLM GROVER_mega link
ChatGPT FAIR_wmt19 link

Citation

If you use (part of) this code, please cite our paper as:

@InProceedings{bhattacharjee-EtAl:2023:ijcnlp,
  author    = {Bhattacharjee, Amrita  and  Kumarage, Tharindu  and  Moraffah, Raha  and  Liu, Huan},
  title     = {ConDA: Contrastive Domain Adaptation for AI-generated Text Detection},
  booktitle      = {Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics},
  month          = {November},
  year           = {2023},
  address        = {Nusa Dua, Bali},
  publisher      = {Association for Computational Linguistics},
  pages     = {598--610},
  url       = {https://aclanthology.org/2023.ijcnlp-long.40}
}

Contact

For any questions, comments, and feedback, contact Amrita Bhattacharjee at abhatt43@asu.edu