/anly502_final

Final Project Repo for ANLY502

Primary LanguagePython

anly502_final

Final Project Repo for ANLY502 This repo contains codes (python,pyspark and R) and some outcome data of our yelp project.

./stats This derectory contains generate statistics on the dataset

./stats/business Pyspark code of some stats on business, includeing business by city, business by category, and txt&graph of results

./stats/user Pyspark code of some stats on user, including number of users by city, user by year and elite user by year.

./choose business Pyspark code of view number of reviews by each month of a certain business

./text Pyspark of word count, map review by pos/neg words and boosting tree model data need for text classification

./data_preperation generate training data for number of review predition

./R This directory contains R file to do review number analysis and a python file to prepare data for it.