/airbnb_data_analysis

Athens Airbnb Data Analysis and Recommendation System Implementation based on the description of each Airbnb using TF–IDF and Cosine Similarity metric.

Primary LanguageJupyter Notebook

Athens Airbnb Data Analysis

The goal of the project is to analyze my country's Airbnb data set. Also, a simple implementation of a recommendation system based on the description of each Airbnb using Term Frequency–Inverse Document Frequency and Cosine Similarity metric.

Data sets

I downloaded the data sets for my country from here. You can do the same for your own or for any other country you want to analyze.

There are 3 datasets:

  • listings.csv
  • calendar.csv
  • reviews.csv

The whole procedure of each notebook consists of:

  1. Loading data sets.
  2. Droping any rows that have a nan value
  • word_cloud.ipynb
  1. Merging data sets
  2. Text preprocessing
  3. Generating Word Clouds
  • recommendation.ipynb
  1. Concatenating name and description columns
  2. Text preprocessing
  3. TF-IDF vectorization
  4. Calculating the similarity of each Airbnb with the others
  5. Storing 100 most similar Airbnbs for each one (Linear time)
  • listings.ipynb
  1. Cleaning price column
  2. Data analysis
  • calendar.ipynb
  1. Cleaning price column and separating date to year, month and day columns
  2. Data analysis

Word Clouds

Description

description

Last Review

last review

Neighbourhood

neighbourhood

Transit

transit