Literata Team: Zheng Maria Yu, Alessio Hu, Jakub Jastrzębski, Joanna Rancew
The project AUTEXTIFICATION: Automatic Text Identification covers both binary classification to distinguish between generated text and human-written text, and multi-class classification to predict what language model generated particular text.
The notebook is structured into two main subtasks:
- Subtask 1: Human or Generated,
- Subtask 2: Which Generation model.
Each subtask begins with data preprocessing and visualizations. Subsequently, a comprehensive collection of models trained on the dataset is presented, ranging from simple machine learning classifiers to various neural networks and transformers.
To facilitate navigation within the notebook, use a table of contents on the left (Colab) to easy access to different sections.
If you want to run the code on your own, remember to use GPU for transformers!