DOC: Digital Outrage Classifier
Developed by members of the Crockett Lab at Yale University in the Psychology and Statistics and Data Science department,
DOC
is a Python package that allows researchers to predict the probability that tweets contain moral outrage.
The details of the development of the code and materials in this repository are described in detail in the paper, "How social learning amplifies moral outrage expression in online social networks (2021).
Repository Contributors
- William Brady | Postdoctoral Fellow | Yale University | william.brady@yale.edu | Website
- Killian McLoughlin | Ph.D. Student | Yale University | killian.mcloughlin@yale.edu | LinkedIn
- Tuan Nguyen Doan | Data Scientist | Quora | tuan.nguyen.doan@aya.yale.edu | LinkedIn
Installation
The first step is to clone the repo into a local directory on you computer. Using the terminal, navigate to the location where you want to store the package and run the following command:
git clone "https://github.com/CrockettLab/outrage_classifier"
Then run the command below. The package is compatible with both Python2 and Python3.
python setup.py install
Importing
The package can be imported using the following code:
import outrageclf as oclf
from outrageclf.preprocessing import WordEmbed, get_lemmatize_hashtag
from outrageclf.classifier import _load_crockett_model
For those using macOS, a runtime error (described here) may prevent the package from being successfully imported. If you experience this issue, setting the environment varibale KMP_DUPLICATE_LIB_OK
to TRUE
should solve the problem:
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'
Usage
The current version of outrageclf
allows users to predict moral outrage using a pre-trained deep gated recurrent unit (GRU) model as described in detail in this article.
To run the pre-trained model used in the article, you will need model files that are NOT hosted in this repository. If you would like access to these files, see 'Accessing Model Files' below. The omited files are:
- A pre-trained embedding model, stored in a
.joblib
format - A pre-trained GRU model, stored in a
.h5
format
In order to predict the probability a tweet contains moral outrage we use the following pipeline:
Load pretrained models -> Preprocess text -> Embed text -> Make prediction
Below is a complete coded instance of the pipeline. Note that this example assumes the presence of either our pretrained-model files or similar files generated by the user:
tweets = [
"This topic infuriates me because it violates my moral stance",
"This is just a super-normal topic #normal"
]
# loading our pre-trained models
word_embed = WordEmbed()
word_embed._get_pretrained_tokenizer(embedding_url)
model = _load_crockett_model(model_url)
# the text are lemmatized and embedded into 50-d space
lemmatized_text = get_lemmatize_hashtag(text_vector)
embedded_vector = word_embed._get_embedded_vector(lemmatized_text)
predict = model.predict(embedded_vector)
Alternatively, classifications can be generated using the package's model wrapper function, stored in the classifier
module. This step bypasses the need to lemmatize and embed text input:
from outrageclf.classifier import pretrained_model_predict
pretrained_model_predict(tweets, embedding_url, model_url)
Accessing Model Files
In order to access the pre-trained model files please fill out this form. The form will ask for your email and a brief description of your use case. We will then email you the model files. Note that the classifier is for use in academic research only. See the license for more information.
Example Notebook
example.ipynb
demonstrates examples of these two use cases.
Citation
Brady, W.J., McLoughlin, K.L., Doan, T.N., & Crockett, M.J. (2021). How social learning amplifies moral outrage expression in online social networks. PsyArXiv. doi: 10.31234/osf.io/gf7t5
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic License.
Release History
- 0.1.0
- Initial release