Sentimetrics Project

Reddit Sentiment Analysis project using PySpark.

Folder Structure

  • Report: In this folder, you'll find our comprehensive final report detailing our entire research process, methodology, results, and conclusions. This report dives deep into the context of sentiment analysis, our objectives, the methods we employed, and the insights we gained. We've covered data collection, preprocessing, feature extraction, model evaluation, and more. If you're interested in a detailed understanding of our project, this report is a must-read.
  • Jupyter Notebook: Our Jupyter Notebook serves as the heart of our research. It contains the code implementation for our sentiment analysis models, hyperparameter tuning, and model evaluation. If you're keen on exploring the nitty-gritty of our code, algorithms, and model performance, this notebook is where the magic happens.
  • Data Scraping: In this section, you'll find our data scraping notebook. We walk you through how we collected data from Reddit using the PRAW library. If you're interested in understanding how we obtained the raw data that powered our analysis, this notebook provides the details you need.