/Literata

NLP project at Politecnico di Milano about AUTEXTIFICATION: Automatic Text Identification

Primary LanguageJupyter Notebook

Literata

NLP project at Politecnico di Milano about AUTEXTIFICATION: Automatic Text Identification

Literata Team: Zheng Maria Yu, Alessio Hu, Jakub Jastrzębski, Joanna Rancew

The project AUTEXTIFICATION: Automatic Text Identification covers both binary classification to distinguish between generated text and human-written text, and multi-class classification to predict what language model generated particular text.

The notebook is structured into two main subtasks:

  • Subtask 1: Human or Generated,
  • Subtask 2: Which Generation model.

Each subtask begins with data preprocessing and visualizations. Subsequently, a comprehensive collection of models trained on the dataset is presented, ranging from simple machine learning classifiers to various neural networks and transformers.

To facilitate navigation within the notebook, use a table of contents on the left (Colab) to easy access to different sections.

If you want to run the code on your own, remember to use GPU for transformers!