NYU Tandon School of Engineering Business Analytics. Technical report named “Startup Survival Kit”, where we analyze thousands of Yelp reviews to find correlations between sentiment, restaurant score, and bad reviews. Our objective is to find what are some of the drivers and bottlenecks for success for small new restaurants. All necessary R files are attached along with a brief explanation in this file.
We handled the data from Yelp through different steps:
The files are readily available here https://www.yelp.com/dataset_challenge/dataset
We transformed the files from Yelp using the “json_to_csv_converter.py” which can be found at https://github.com/Yelp/dataset-examples:
Yelp_Data_Handling.r (Attached)
Yelp_Data_LinearRegression.r (Attached)
The model was generated in R and then we cross checked with Rattle. The decision tree was not able to arrive at a solution in rStudio and Rattle was used instead.
We used a Python script by Ryo Kita to parse the yelp dataset for easier analysis in R.
Yelp_Data_Text_Analysis.r (Attached)
All charts were plotted again using Tableau, R and Raw.
The web visualization was done with Shiny. We start the server with the function RunApp()
a. ui.R (Attached)
b. server.R (Attached)