Statistical Inference W4702 Final Project
Motivation behind studying airbnb data In recent years, we come across discussions in the media and in our communities of how the use of various sharing economy platforms such as Uber and Airbnb have become more prevalent and might have a positive or possibly less positive impact on how we operate our lives.
We ourselves may be avid users of some of these platforms. For example, as a potential guest or owner of an Airbnb listing, one might be interested in determining whether listing prices are fair given the property, and in particular, what variables might be the most meaningful predictors of listing price. If you are an Airbnb host, you might wonder which variables can be the best predictors of whether you achieve the "superhost" status.
Data set requirements Furthermore, we had in mind some criteria for selecting an interesting set of data to explore. We feel Airbnb's listing data fits our search:
-
It is a comprehensive data set with 94 original variables and over 30,000 observations for NYC area listings.
-
It contains a rich set of variables, including a range of numerical and categorical variables.
-
The data was recently compiled in September 2015 by the author of insideairbnb.com.
-
The data contains interesting information on trends in properties and communities in the New York area.
Some questions or insights we hope to address in this project include:
-
Exploring the relationships between various predictors and designated Y variables price, or superhost status -- how strong are these relationships and do they appear to be linear or non-linear?
-
Building a classification model and identifying variables for predicting whether an Airbnb host is likely to be classified as a superhost.
-
Building a GAM model and identifying the best predictors for predicting the Y variable as price.