/sentimentanalysis

Comment sentiment analysis of the top 25 posts (from the last 24 hrs) on a subreddit (reddit.com) using a web scraper.

Primary LanguageHTMLMIT LicenseMIT

Comment Sentiment Analysis of the Top 25 Posts on a Subreddit (www.reddit.com) (from the last 24 hrs)

Languages: Python, SQL(SQLite)

Purpose of the program: To define, evaluate, and visualize overall public sentiment towards various news articles.

Three versions of the program are available in this repository:

  • RedditbotSpidernews scraps and analyzes the top posts from the last 24 hrs on /r/news/.
  • RedditbotSpiderpolitics scraps and analyzes the top posts from the last 24 hrs on /r/politics/.
  • RedditbotSpiderworldnews scraps and analyzes the top posts from the last 24 hrs on /r/worldnews/.

Note: The program can be (theoretically) used on any subreddit by changing the address and (if needed) altering the XPath's within RedditbotSpider.py.


What the program does:

  • Web scraper connects to subreddit and collects the top 25 post titles, as well as comments within each post.
  • Data is inserted into a SQLite database.
  • Data is cleaned up: any rows lacking a comment are deleted.
  • Comments are combined for each corresponding title and placed into a new database table.
  • A unique ID (1-25) is added for each title and corresponding group of comments.
  • Lexicon (word-based) for sentiment analysis is applied to each set of comments.
  • Data visualization: an interactive (html) bar chart, CSV file, and completion window are generated.

Lexicon used to extract an overall sentiment level:

Positive +1 Negative -1
good fuck
great corrupt
happy stupid
win irrelevant
love colluding
nice horrible
authentic unfair
like guilty
fun foolish
appreciate hateful

How to run the program:

  • Download and install SQLite
  • Download and install Python 3.6.3
  • Make sure your System PATH includes the path to Python's interpreter
  • In Windows Command Prompt do/install the following:
    • pip3 install pandas
    • pip3 install scrapy
    • pip3 install plotly
    • pip install pypiwin32
  • Download this repository & unzip it
  • sentimentanalysis-master->RedditbotSpidernews or RedditbotSpiderpolitics or RedditbotSpiderworldnews->right click on main.py, edit with IDLE->Run->Run Module
Note: Before running the program a second time, move or delete the generated/results files: test.db, temp-plot.html, and results.csv out of the RedditbotSpidernews/RedditbotSpiderpolitics/RedditbotSpiderworldnews folder.

Tools/Libraries/Packages used: