Inside Airbnb is an internet platform that offers information and support regarding Airbnb’s influence on residential communities. Their goal is to empower communities with data and knowledge, enabling them to comprehend, make decisions and have control over the rental of residential houses to tourists.
The primary goal of this project is to determine the relationship and pairwise differences between log price and the host response time with a focus on three different categories. At first, the measure of central tendency is utilized in descriptive statistics to describe the distribution of these two variables. Secondly, assumptions of the ANOVA test are checked including the independence, normality, and homogeneity of variance. Graphical technique such as Q-Q plot are used to examine the normality assumption, where Levene’s test determines the homogeneity of variance. Subsequently, Two-Sample
The dataset used in this study is contributed by the acclaimed instructors of the ‘Introductory Case Studies’ course at TU Dortmund during the Winter 2023/2024 semester. Notably, the dataset has been taken from the Inside Airbnb website for the Seattle, Washington, United States region. The sample size of the dataset is 232 observations along with 2 variables. The variables of the dataset are ‘log price’ and ‘host response time’. The variables log_price and host_response_time are renamed and used in the analysis as ‘log price’ and ‘host response time’. In addition, the ‘host response time’ variable is categorized into three levels: ‘within a few hours’, ‘within a day’ and ‘within an hour’.
From the variables, the ‘log price’ variable is numerical (continuous) and the ‘host response time’ variable is categorical (ordinal). After executing the R program, it is discovered that the dataset is well organized and there are no missing values in the entire dataset. Overall, the data quality is considered to be good.
In this section, several statistical methodologies are presented. The entire dataset was analyzed with R programming language, version 4.1.1, which was developed by the R Development Core Team. Moreover, a few R packages, including ggplot2, gridExtra, tidyverse, skimr, rstatix, car and ggpubr are used for the statistical data analysis, calculations and visualizations.
- Statistical Hypothesis Testing, 2. Confidence Interval, 3. One-way ANOVA, 4. Q-Q Plot, 5. Levene’s Test, 6. Two-Sample t-Tests, 7. Bonferroni Method, 8. Tukey’s Honest Significant Difference (HSD)
In this section, all dataset variables, such as ‘log price’ and ‘host response time’ are utilized. Moreover, the ‘host response time’ variable was categorized into three portions and used in the analysis. The main goal of this study is to establish the correlation and individual distinctions between log price and the host response time, with a specific emphasis on three different categories. Finally, the last subsection shows a brief comparison between the non-adjusted and the adjusted method.
This study involved doing a comparative analysis of different tests to examine the correlation between the status of host response time and its influence on log price. The dataset was sourced from the Inside Airbnb website for the Seattle, Washington, United States region. This dataset included 232 observations with a categorical variable ‘host response time’ and continuous variable ‘log price’. The ‘host response time’ variable was categorized into three levels: ‘within a few hours’, ‘within a day’ and ‘within an hour’. The dataset was precisely organized and did not contain any missing values throughout.
The main objective of this study was to calculate the correlation and individual disparities between the log price and the host response time, with specific emphasis on three distinct categories. Initially, we used descriptive statistics on these two variables. Findings indicated that ‘within a few hours’ exhibits the highest variability, reflected in elevated interquartile range, standard deviation, and variance values. This suggested potential complexity and larger information in interpreted response times within this category. Moreover, Figure 1 employed Q-Q plot to assess data distribution within host response time categories, revealing that ‘within a day’ approximates a normal distribution, while ‘within a few hours’ and ‘within an hour’ exhibited deviations with scattered data points. After that, we implemented Levene’s and ANOVA methods to see the categories differences. Levene's test confirmed equal population variances (
After that, pairwise