Causal Effects of Brevity on Style and Success in Social Media

This repository provides information about the data collected and used in our CSCW 2019 paper.

The description of the dataset

For each of the original input tweets, the dataset contains a dictionary with the following information:

-original_tweet_text: The text of the original input tweets

-Q1: The first comprehension question
-Q1_possible_anser1: The first possible answer to the first question
-Q1_possible_anser2: The second possible answer to the first question
-Q1_possible_correct_answer: The correct answer to the first question

-Q2: The second comprehension question
-Q2_possible_anser1: The first possible answer to the second question
-Q2_possible_anser2: The second possible answer to the second question
-Q2_possible_correct_answer: The correct answer to the second question

-Q3: The third comprehension question
-Q3_possible_anser1: The first possible answer to the third question
-Q3_possible_anser2: The second possible answer to the third question
-Q3_possible_correct_answer: The correct answer to the third question

-treated_versions: a dictionary with information about each treated version
      -10-20%: Information about the 10-20% level of relative shortening
        -tweet: The text of the edited tweet
        -success_prob: Probability that the edited version is more successful 
                       compared to the original tweet, measured as the fraction
                       of the votes in favor of the edited version. This is probability
                       of success as defined in the subsection "Measuring the effect of
                       brevity on message success"
      ...

      -80-90%: Information about the 80-90% level of relative shortening
        -tweet
        -success_prob

      -baseline: Information about the baseline
        -tweet
        -success_prob

It is necessary to specify the encoding in order to load the file successfully. For example, to inspect the example from the publication in the Table 3:

import json
import pprint

with open('brevity_dataset.json', encoding="utf-8") as file:
    data = json.load(file)
  
pprint.pprint(data["4"])

Reference

Please cite the paper when using the data:

Causal Effects of Brevity on Style and Success in Social Media, Kristina Gligorić, Ashton Anderson and Robert West. ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW), Austin, Texas, November 2019. https://dlab.epfl.ch/people/west/pub/Gligoric-Anderson-West_CSCW-19.pdf