Libraries Used
• Pandas === 0.23.4
• Numpy === 1.15.4
• Scikit Learn === 0.20.1
• Matplotlib === 3.0.2
The Motivation for the Project
I chose to analyze the AirBnB Seattle property listings from 2016 for this project because this data has similarities to data sources one might see in a business setting. I have strong interest in analyzing data about property rental listings since the real estate and the hospitality industries play such a significant role in most peoples' lives. Additionally, many of the approaches I used in this project are directly applicable to my present and future work projects.
To view additional visualizations regarding this analysis, feel free to visit my Medium post about this analysis: https://medium.com/@timenalls/how-to-analyze-airbnb-seattle-listings-using-data-science-approaches-8811235f6e8b
The files in the repository
I included a python file containing the project. I did not include other files yet due to the 25MB file size limitation.
A summary of the results of the analysis
Using the data, I answered the following questions:
-
What months of the year have the highest average listing prices? The prices are highest in the summer months—June, July, and August.
-
Which neighborhoods have the highest rating review scores? Central Area, West Seattle, and Delridge have the highest review score ratings. However, there isn't much variance in the review score ratings.
-
Which neighborhoods have the highest listing prices? Magnolia, Queen Anne, and Downtown have the highest listing prices.
-
What attributes in the listing data most associate or contribute to prices?
The attributes for the cluster with the highest average price (43% higher than the mean prices across clusters):
• Cluster 3 - host listings count, the downtown neighborhood, require guest profile picture, condominiums, etc.
The attributes for the cluster with the lowest average price (43% lower than the mean prices across clusters):
• Cluster 6 - private room room type, steward park neighborhood, Beacon Hill neighborhood, house property type, etc.
I used a k-means clustering model to answer the last question.
Acknowledgements
The data I used in this analysis can be found at this URL: http://insideairbnb.com/get-the-data.html.