/california-housing-dataset-machine-learning

Machine learning projects using the California Housing dataset

Primary LanguageJupyter Notebook

California Housing Dataset Machine Learning Projects

Source:

This dataset is a modified version of the California Housing dataset, built using the 1990 California census data. It contains one row per census block group.

The dataset contains information on 9 variables collected from all the block groups in California in the 1990 Census. The dependent variable is the median house value.

This dataset is used in the book "Hands-On Machine Learning" by Aurélien Géron to demonstrate a sample end-to-end machine learning project workflow. https://github.com/ageron/handson-ml2/tree/master/datasets/housing

This repository contains machine learning projects using the California Housing dataset. The projects include:

Exploratory data analysis

The DataFrame has 10 columns and 20640 rows. The columns are:

  • longitude: The longitude of the property.
  • latitude: The latitude of the property.
  • housing_median_age: The median age of the housing units in the census block.
  • total_rooms: The total number of rooms in the census block.
  • total_bedrooms: The total number of bedrooms in the census block.
  • population: The population of the census block.
  • households: The number of households in the census block.
  • median_income: The median income of households in the census block.
  • median_house_value: The median value of housing units in the census block.
  • ocean_proximity: The proximity of the property to the ocean (categorical variable).

Feature engineering

Model selection

Model evaluation

Model deployment