Nowadays the Airbnb platform is growing with a big supply of accommodations. Therefore, people start looking more critically at different features of Airbnb’s. For example, the quantity of rooms, prices and host characteristics. This research will investigate the effect of host characteristics on Airbnb prices.
Due to recent high inflation rates, it is interesting to compare this effect of host characteristics on price between cities with low inflation and cities with high inflation. Currently, inflation in many parts of the world is increasing. This inflation is due to many economies recovering from the COVID recession in 2020. Additionally this inflation is due to the rising gas prices. This research will use two city subsets, namely a subset of five cities with high inflation rates consisting of: Rio de Janeiro, Mexico City, Boston, Cape Town and Santiago and a subset of five cities with low inflation, consisting out of: Tokyo, Geneva, Beijing, Bangkok and Athens. These cities were selected based on inflation data from the past few years from (https://www.theglobaleconomy.com/rankings/inflation/). To what extent does inflation moderate the effect of host characteristics on the Airbnb prices in these countries?
What is the effect of different host characteristics on Airbnb prices, moderated by high or low inflation?
4 different subquestions:
- What is the effect of different host characteristics on Airbnb prices in cities with low inflation?
- What is the effect of different host characteristics on Airbnb prices in cities with high inflation?
- What is the difference between the effect of different host characteristics on Airbnb prices in cities with high inflation and cities with low inflation?
- What is the general effect of different host characteristics on Airbnb prices?
In our research, we decided that we want to compare the effect of host characteristics on prices between cities with high inflation and cities with low inflation. To select the cities included in the dataset, we used an overview of inflation by country around the world in the past years from The Global Economy (https://www.theglobaleconomy.com/rankings/inflation/). We compared these countries with the cities of which datasets were available, and selected the following cities:
Cities with low inflation:
- Tokyo, Japan
- Geneva, Switzerland
- Beijing, China
- Bangkok, Thailand
- Athens, Greece
Cities with high inflation:
- Rio de Janeiro, Brazil
- Mexico City, Mexico
- Boston, United States
- Cape Town, South Africa
- Santiago, Chile
We combined these seperate datasets into three different bigger datasets: one dataset with all information about the cities with low inflation, one dataset with all information about the cities with high inflation and one general dataset with all information of all cities. The seperate low inflation and high inflation datasets can be used to compare the difference of host characteristics on prices of Airbnb's. The general dataset with all cities included can be used to create a general overview of the effects of different host characteristics on prices of Airbnb's. Later, we will clean these datasets so they can be easily used in our analysis.
In total, the datasets consist of 75 different variables. However, for this research, only the specific variables about host characteristics and prices of the Airbnb's are relevant. The following variables in the datasets will be used and analyzed in our research:
Variable name | Variable explanation |
---|---|
price_in_dollars (Y) | Price of the Airbnb in dollars |
host_years (X1) | How many years the host has been active now |
host_response_time_recoded (X2) | How fast the host responds rated from 1 to 4 |
host_response_rate_recoded (X3) | How often the host responds rated from 0 to 1 |
host_is_superhost (X4) | Dummy whether the host is a superhost |
host_has_profile_pic (X5) | Dummy whether the host has a profile pic |
host_identity_verified (X6) | Dummy whether the identity of the host is verified |
This project will use the Ordinary Least Square (OLS) regression method to examine the effect of different host characteristics of Airbnb's in low and high inflation countries. We can use the OLS regression to see whether the relationship between the variables is positive or negative. The dependent variable is the Airbnb price in dollars. The independent variables are given in the table above, notated by X. The regression is as follows:
Y = b0 + b1X1 + b2X2 + b3X3 + b4X4 + b5X5 + b6X6
Here, host_is_superhost, host_has_profile_pic and host_identity_verified are dummy variables.
To investigate the effect of host characteristics (independent variables) on Airbnb's prices (dependent variable) between cities with low inflation or high inflation, we conducted a linear regression for the low inflation, high inflation and full datasests. The output of the regressions can be found below:
Looking at the output of the regression, several variables have a significant effect on price. There are more variables that have a significant effect on price in high inflation cities (regression 2) than in low inflation cities (regression 1). Two out of six variables have a significant effect on Airbnb's price in low inflation cities while five out of six host characteristics have a significant effect on Airbnb's price in high inflation cities. In the regression of all cities together (3) it can be observed that again 5 out of 6 variables have a significant effect on price. Therefore, with these results we can conclude that host characteristics do have an effect on Airbnb's price.
A more detailed analysis of these results can be found in the PDF in the gen folder.
├── README.md
├── data
├── gen
│ ├── analysis
│ ├── data-preparation
│ └── paper
└── src
| ├── analysis
| ├── data-preparation
| └── paper
└── make file
For this research, the downloading of the data, the cleaning of the data and the OLS regression were done using R and Rstudio. To run each file smoothly in one time, a makefile was generated.
- Make: Click here to see how to install Make
- R and RStudio: Click here to see how to install R and RStudio
In R, the following packages were used. If you did not download them yet, please use install.packages() to do so. Otherwise, you can load each package using the library() function:
library(tidyverse)
library(dplyr)
library(ggplot2)
library(readr)
library(stargazer)
It is most easy to run the makefile, this will run each source code in the right sequence leading eventually to the results of the analysis. You can run the makefile by following these steps:
- Fork this repository to your own GitHub account
- Clone the repository just forked to your local computer using Git / terminal / command prompt. Go to the right directory you want to clone the repository into and type:
git clone https://github.com/{your username}/host-characteristics-on-airbnb-prices.git
- Set your working directory to the just cloned folder using cd host-characteristics-on-airbnb-prices
- Type make, this will run all the source code (it could take a while)
- In your local folder, the generated stargazer output with the regression results can be found:
/host-characteristics-on-airbnb-prices/gen/analysis/output/model_report_airbnb.html
If you want to run each dataset seperately, this should be done in the following order:
- download_data.R
- merge_data.R
- data_transformation.R
- clean_data.R
- analyze.R
The following website was used to decide which cities to include in the high and low inflation dataset:
This project is conducted for the Data Preparation and Workflow Management course at Tilburg University. The members of our team are:
- Nynke Voermans n.voermans@tilburguniversity.edu
- Nina Verschuuren n.a.f.verschuuren@tilburguniversity.edu
- Meggy Lemmens m.m.e.lemmens@tilburguniversity.edu
- Amber Kalse a.e.s.kalse@tilburguniversity.edu