Final Project Repo for ANLY502 This repo contains codes (python,pyspark and R) and some outcome data of our yelp project.
./stats This derectory contains generate statistics on the dataset
./stats/business Pyspark code of some stats on business, includeing business by city, business by category, and txt&graph of results
./stats/user Pyspark code of some stats on user, including number of users by city, user by year and elite user by year.
./choose business Pyspark code of view number of reviews by each month of a certain business
./text Pyspark of word count, map review by pos/neg words and boosting tree model data need for text classification
./data_preperation generate training data for number of review predition
./R This directory contains R file to do review number analysis and a python file to prepare data for it.