DOC: Digital Outrage Classifier

Developed by members of the Crockett Lab at Yale University in the Psychology and Statistics and Data Science department, DOC is a Python package that allows researchers to predict the probability that tweets contain moral outrage.

The details of the development of the code and materials in this repository are described in detail in the paper, "How social learning amplifies moral outrage expression in online social networks (2021).

Repository Contributors

William Brady | Postdoctoral Fellow | Yale University | william.brady@yale.edu | Website
Killian McLoughlin | Ph.D. Student | Yale University | killian.mcloughlin@yale.edu | LinkedIn
Tuan Nguyen Doan | Data Scientist | Quora | tuan.nguyen.doan@aya.yale.edu | LinkedIn

Installation

The first step is to clone the repo into a local directory on you computer. Using the terminal, navigate to the location where you want to store the package and run the following command:

git clone "https://github.com/CrockettLab/outrage_classifier"

Then run the command below. The package is compatible with both Python2 and Python3.

python setup.py install

Importing

The package can be imported using the following code:

import outrageclf as oclf
from outrageclf.preprocessing import WordEmbed, get_lemmatize_hashtag
from outrageclf.classifier import _load_crockett_model

For those using macOS, a runtime error (described here) may prevent the package from being successfully imported. If you experience this issue, setting the environment varibale KMP_DUPLICATE_LIB_OK to TRUE should solve the problem:

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

Usage

The current version of outrageclf allows users to predict moral outrage using a pre-trained deep gated recurrent unit (GRU) model as described in detail in this article.

To run the pre-trained model used in the article, you will need model files that are NOT hosted in this repository. If you would like access to these files, see 'Accessing Model Files' below. The omited files are:

A pre-trained embedding model, stored in a .joblib format
A pre-trained GRU model, stored in a .h5 format

In order to predict the probability a tweet contains moral outrage we use the following pipeline:

Load pretrained models -> Preprocess text -> Embed text -> Make prediction

Below is a complete coded instance of the pipeline. Note that this example assumes the presence of either our pretrained-model files or similar files generated by the user:

tweets = [
          "This topic infuriates me because it violates my moral stance",
          "This is just a super-normal topic #normal"
         ]

# loading our pre-trained models
word_embed = WordEmbed()
word_embed._get_pretrained_tokenizer(embedding_url)
model = _load_crockett_model(model_url)

# the text are lemmatized and embedded into 50-d space
lemmatized_text = get_lemmatize_hashtag(text_vector)
embedded_vector = word_embed._get_embedded_vector(lemmatized_text)
predict = model.predict(embedded_vector)

Alternatively, classifications can be generated using the package's model wrapper function, stored in the classifier module. This step bypasses the need to lemmatize and embed text input:

from outrageclf.classifier import pretrained_model_predict
pretrained_model_predict(tweets, embedding_url, model_url)

Accessing Model Files

In order to access the pre-trained model files please fill out this form. The form will ask for your email and a brief description of your use case. We will then email you the model files. Note that the classifier is for use in academic research only. See the license for more information.

Example Notebook

example.ipynb demonstrates examples of these two use cases.

Citation

Brady, W.J., McLoughlin, K.L., Doan, T.N., & Crockett, M.J. (2021). How social learning amplifies moral outrage expression in online social networks. PsyArXiv. doi: 10.31234/osf.io/gf7t5

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic License.

Release History

0.1.0
- Initial release

sbw78/outrage_classifier