/Housing-Information-Melbourne

Integration of 7 different datasets in various formats about housing information in Victoria, Australia. And study the effect of different normalization/transformation methods

Primary LanguageJupyter Notebook

Housing-Information-Melbourne

By using a Python code we can integrate several datasets into one single schema and find and fix possible problems in the data. In this case we are going to use 7 different datasets in various formats about housing information in Victoria, Australia. Each of you is given 7 datasets in various formats and the data is about housing information in Victoria, Australia. The first task is to integrate all the datasets into one dataset:

  • Hospitals (HTML Format)
  • Supermarkets (Excel Format)
  • Shopping centers (PDF Format)
  • Real Estate (XML format)
  • Real Estate (JSON format)
  • Vic_suburb_boundary (Shape Format)
  • GTFS_Melbourne_Train_Information (Text Format)

The second task is to study the effect of different normalization/transformation methods:

  • Z-score Standardization
  • Minmax normalization

And observe and explain their effect assuming we want to develop a linear model to predict the price of a property using Distance_to_sc, travel_min_to_CBD, and Distance_to_hospital attributes.