Authors: Yigeng Zhang, Mahsa Shafaei, Fabio Gonzalez and Thamar Solorio
This repository hosts the dataset and the source code of the paper From None to Severe: Predicting Intensity in Movie Scripts.
Please find the dataset under the Data
folder.
This zip file contains train/dev/test files of 5 aspects of age-restricted content.
The data is formatted in the following columns in a Pandas Dataframe:
id | Aspect | None | Mild | Moderate | Severe | Total_votes | Aspect_rating | text
Please find the code under the Code
folder.
In this work, we use Python 3.7.9
. The dependency libraries are with the following versions:
NumPy 1.18.5
Pandas 1.1.3
PyTorch 1.6.0
PyTorch Lightning 1.0.2
Scikit-learn 0.23.2
Sentence-transformers 0.4.1.2
Use the following command to embed all text into utterance-level sentence embeddings.
python text_embedding.py --data_dir your-save-path
The reason for obtaining and saving sentence embeddings beforehand is to reuse and save experiment time. Otherwise getting embeddings together with training will drastically increase the running time.
Use the following command to run the training-test script with default settings.
python RNN-Trans_S-MT.py
python TextRCNN_S-MT.py
To test the code correctness without running the full training cycle, a fast dev run on a single batch is available using the following command:
python RNN-Trans_S-MT.py --dev_run
If you would like to use our work and code for research, please cite our paper with the following info:
@inproceedings{zhang-etal-2021-none-severe,
title = "From None to Severe: {P}redicting Severity in Movie Scripts",
author = "Zhang, Yigeng and
Shafaei, Mahsa and
Gonzalez, Fabio and
Solorio, Thamar",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-emnlp.332",
pages = "3951--3956",
}
Please contact yzhang168@uh.edu
for questions.