Big-Data-Management-Data-Visualisation

Introduction

The dataset used in this report is from Kaggle's Berlin Airbnb Data whose aim is to investigating Airbnb activity in Berlin, Germany. There are 6 files summing up to 126 columns.

Datasets Content

The datasets were scraped on November 07th, 2018 and contain detailed listings data, review data and calendar data of current Airbnb listings in Berlin.

Aim

The aim of this report is to undertake data analysis and data visualization on this big data using pyspark a big data library from Apache. The following are some of the insights we would like to get from the dataset:

  1. What are the busiest times of the year to visit Berlin? By how much do prices spike?
  2. Can we uncover trends in listings of Airbnb visitors to Berlin?
  3. Using the listings variables, can we predict the price of a listings?