This is a dataset of top posts from reddit. It contains the top 1,000 all-time posts from the top 2,500 subreddits, so 2.5 million posts in total. The top subreddits were determined by subscriber count and are located in the manifest file within.
Ths data was pulled between August 15-20 of August 2013.
Each file is a CSV with the related subreddit as its filename. Each CSV file contains a header line.
A good example of the structure of this data is here.
How is corporate media reflected on social media? How do world events published in mainstream media reflect sentiment of social media posts?
For NLP We will be using nltk. Install with pip: sudo pip install -U nltk
Install with pip: pip install -U scikit-learn
https://github.com/umbrae/reddit-top-2.5-million
####Newtown school shooting http://www.endmemo.com/events/evt/2012gunmanassaults.php
####NSA Prism program leaked http://www.endmemo.com/events/evt/2013usnsa.php
####Boston marathon bombed http://www.endmemo.com/events/evt/2013bostonmarathon.php
####Facebook IPO http://www.endmemo.com/events/evt/2012facebookipo.php
We have 3 news articles from NYT, Fox, and CNN for each event. These are stored as text files under /articles.
####Useful links
- https://pypi.python.org/pypi/textblob
- https://textblob.readthedocs.org/en/dev/quickstart.html#sentiment-analysis
- http://textblob.readthedocs.org/en/latest/advanced_usage.html
- http://scikit-learn.org/stable/user_guide.html
- http://scikit-learn.org/stable/install.html
#####Hypothesis
-
Foreseen events such as the facebook IPO will show less disparity between sentiment before and after the event. What we predict is that random events like the Newton massacre will show a huge disparity between before/after the event.
-
High profile traumatic news events might inititally decrease subjectivity but weeks after, when the event degenerates to a commentary about gun laws or something like that, will increase subjectivity. Also, polarity will probably be negative following the events because of the impact people feel from the events.
-
News articles will fit into the densest cluster of sentiment. They will not represent extreme opinion but rather a more neutral ideology.
-
Different news sources will maintain different average sentiments. This may be based on political leaning
#####METHODS We will use positivity, negativity, objectivity, and polarity as predictors.
These are the clusters that we were able to render through our Sentiment Analysis.
A week before Boston Marathon Bombing: