Priceonomics Data Puzzle
About
TreefortBnb, Airbnb's fictious rival, connects luxurious treefort homes around the United States. Priceonomics curated a dataset with the pricing of different treeforts and challenged us to find the median price of booking a treefort in each of the top 100 cities. Full backstory and challenge available here.
More specifically, Priceonomics requested us to:
- Send us back a table with a list of the median price in each city, ranked from most to least expensive.
- Restrict your analysis to just to the "top 100" cities that have the most units on the market.
Directory
This repository contains solutions to the Priceonomics Treefortbnb challenge in R and Python. Note: Python analysis in Jupyter Notebook and rendered to HTML for in-line analysis. Solutions used the following steps.
- Load libraries
- Import data
- Clean data
- Conduct stated analysis
- Additional EDA
Results
Median prices for the top 100 cities in the dataset are available in the solution.csv. Below are the top 10 results:
Rank | city | state | median_price_USD |
---|---|---|---|
1 | Indianapolis | IN | 650 |
2 | Malibu | CA | 304 |
3 | Park City | UT | 299 |
4 | Truckee | NV | 275 |
5 | Healdsburg | CA | 275 |
6 | Laguna Beach | CA | 268.5 |
7 | Incline Village | NV | 259 |
8 | Manhattan Beach | CA | 209 |
9 | Charlotte | NC | 200 |
10 | Sonoma | CA | 200 |
Additional analysis
I was surprised that Indianapolis, IN topped the list. As a state capital, I expected it to be high, but not higher than a globally recognized city like San Francisco. However, I was interested that its median price is double that of the second highest city: Malibu, Ca. The median is a robust statistic that is not sensitive to outliers. I suspected that the data had a right skew and investigated.
To better understand the dataset, I plotted a histogram of the USD price for all units.
created in puzzle_solution.ipynb
There are many treeforts above $2,000! There are even some units listed for $10,000!
Next, I looked specifically at the listings for the most expensive median city in the US: Indianapolis, IN.
Statistic | price_USD |
---|---|
units | 251 |
mean | 915.40 |
st_dev | 969.65 |
min | 33 |
max | 5500 |
Q1 | 162.5 |
Q2 | 650 |
Q3 | 1200 |
The mean price for a treefortBnb in Indianapolis is $915.40, with a standard deviation of 969.65, and a range from $33 - $5,500! There is something questionable about the data here - I can't imagine many people paying $5,500 for a treehouse!
Priceonomics also conveniently records the number of reviews for each treefort. Below is a scatterplot of each Indianapolis $ Price vs Number of Reviews.
created in puzzle_solution.R
The graph shows that prices above $1,000 USD generally have 0 reviews.
If we consider reviews to be a proxy measure for visits, we can assume that these units do not constitute normal treefortBnb experiences. I suspected that this same relationship is present in the larger dataset and thus graphed a scatterplot of Unit Price USD vs Number of Reviews for all data.
created in puzzle_solution.ipynb
Bingo! It appears that the more expensive the treefort is, the fewer reviews it has. The spirit of the median (and other measures of central tendency) is to get an understanding of normal in the dataset. I doubt that a $10,000 treefort consistutes a 'normal' experience for renters.
To better represent normal, I re-calculated the median price of booking a treefort in each of the top 100 cities, given the records had 1+ reviews.
city | state | median_price_usd |
---|---|---|
Carmel | CA | 300.0 |
Malibu | CA | 225.0 |
Truckee | NV | 200.0 |
Laguna Beach | CA | 200.0 |
Incline Village | NV | 200.0 |
Of course, it doesn't matter whether or not the listings had reviews --Treefort BnB is fictional and none of the records represent a real experience. However, this is a useful exercise to demonstrate how a single (robust) measure of central tendency can tell a misleading story and you should always be ready to dig deeper into the data (: