Exploring Airbnb Listings Datasets

What's the point of this project?

Arguably, one of the most important steps in a data science/machine learning project is communicating results. Why go through the trouble of doing all this work if findings and/or results aren't communicated with others? This project was created to practice wrangling a dataset and find interesting things in the data that can tell us something useful about the reality it represents.

Data Source: insideairbnb.com

My blog post for this analysis.

Analysis can be found in the Airbnb.ipynb and/or the Airbnb.html files.

Project Description

This is a simple data analysis & machine learning project. In this Jupyter Notebook, I analyzed the following 3 Airbnb listings datasets.

San Francisco, CA
New York, NY
Austin, TX

I used Exploratory Data Analysis techniques and some machine learning algorithms to answer the following three questions...

What are the strongest predictors for Airbnb listing price?
Do hosts with many properties give better or worse service than hosts with only one?
Do reviews matter when considering price?

In the analysis, I explain my thought processes and decisions as I go. I hope you find this interesting and useful!

TL;DR? Here are my findings from these datasets...

I saw that hosts with only one listing are rated slightly higher than hosts with more than one listing.
I saw that hosts with no recent reviews are priced slightly higher than hosts with recent reviews, and as a bonus, I saw that Superhosts are typically more expensive than non-Superhosts.
I found that the size of the property, whether or not the property is an Apartment in NY, and luxurious amenities were all strong predictors of the price.

Repository Contents

Airbnb.ipynb : Jupyter notebook containing analysis
Airbnb.html : HTML-friendly version of the notebook above
.gitignore : used to prevent committing datasets and ipynb checkpoints to repo

Dependencies

Machine Learning

sklearn

Visualization

plotly
plotly_express
seaborn
matplotlib

Data Analysis

pandas
numpy
scipy
category_encoders
math
collections
datetime

Acknowledgements

insideairbnb.com for collecting the data
Airbnb for creating the data, and allowing it to be publicly accessible
Udacity for the project idea and the feedback
Contributers to the libraries I leveraged above

codeAligned/exploring_airbnb_with_ML

Exploring Airbnb Listings Datasets

Project Description

Repository Contents

Dependencies

Acknowledgements