/harvard-capstone-amazon-reviews-polarity

A Text Classification/Sentiment Analysis project using the Amazon Reviews Polarity dataset.

Primary LanguageR

Text Classification/Sentiment Analysis - A Harvard Capstone Project

A Text Classification/Sentiment Analysis project using the Amazon Reviews Polarity dataset.


Project Overview: Text Classification/Sentiment Analysis

This project offers an empirical exploration on the use of Neural networks for Text Classification/Sentiment Analysis using the Amazon Reviews Polarity dataset.

We will cover four network architectures, namely DNN, CNN, sepCNN and BERT.


Components

This repository contains the following files:

  1. A report in the form of both a PDF document and an Rmd file.
  2. An Rmd file that I used to perform the machine learning task and create the pdf document.
  3. An R script that can also be used to perform the machine learning task.

Important: Data set download location

As Google has changed it's API(atleast I was unable to use it), I had to download the dataset manually from the following URL:

Please select file named "amazon_review_polarity_csv.tar.gz" and download it to the project directory.

Download Location URL : Xiang Zhang Google Drive

Once downloaded, the script should take care of the rest.


Report

The report documents the analysis and presents the findings, along with supporting statistics and figures. The report includes the following sections:

  1. an introduction/overview/executive summary section that describes the dataset and summarizes the goal of the project and key steps that were performed
  2. a methods/analysis section that explains the process and techniques used, including data cleaning, data exploration and visualization, insights gained, and my modeling approach
  3. a results section that presents the modeling results and discusses the model performance
  4. a conclusion section that gives a brief summary of the report, its limitations and future work

Contact

You are welcome to: