Comment Sentiment Analysis of the Top 25 Posts on a Subreddit (www.reddit.com) (from the last 24 hrs)
Languages: Python, SQL(SQLite)
Purpose of the program: To define, evaluate, and visualize overall public sentiment towards various news articles.
Three versions of the program are available in this repository:
- RedditbotSpidernews scraps and analyzes the top posts from the last 24 hrs on /r/news/.
- RedditbotSpiderpolitics scraps and analyzes the top posts from the last 24 hrs on /r/politics/.
- RedditbotSpiderworldnews scraps and analyzes the top posts from the last 24 hrs on /r/worldnews/.
Note: The program can be (theoretically) used on any subreddit by changing the address and (if needed) altering the XPath's within RedditbotSpider.py.
What the program does:
- Web scraper connects to subreddit and collects the top 25 post titles, as well as comments within each post.
- Data is inserted into a SQLite database.
- Data is cleaned up: any rows lacking a comment are deleted.
- Comments are combined for each corresponding title and placed into a new database table.
- A unique ID (1-25) is added for each title and corresponding group of comments.
- Lexicon (word-based) for sentiment analysis is applied to each set of comments.
- Data visualization: an interactive (html) bar chart, CSV file, and completion window are generated.
Lexicon used to extract an overall sentiment level:
Positive +1 | Negative -1 |
---|---|
good | fuck |
great | corrupt |
happy | stupid |
win | irrelevant |
love | colluding |
nice | horrible |
authentic | unfair |
like | guilty |
fun | foolish |
appreciate | hateful |
How to run the program:
- Download and install SQLite
- Download and install Python 3.6.3
- Make sure your System PATH includes the path to Python's interpreter
- In Windows Command Prompt do/install the following:
- pip3 install pandas
- pip3 install scrapy
- pip3 install plotly
- pip install pypiwin32
- Download this repository & unzip it
- sentimentanalysis-master->RedditbotSpidernews or RedditbotSpiderpolitics or RedditbotSpiderworldnews->right click on main.py, edit with IDLE->Run->Run Module
Tools/Libraries/Packages used: