California-Housing-End-To-End-ML-Project

In this project, California census data is used to build a model of housing prices in the state. This data includes metrics such as population, median income, and median housing prices for each block in California. Block groups are smallest geographical for which US Census Bureau publishes sample data.

Problem Statement

Prediction of block's median housing price.

Performance Measure Chosen:

RMSE - Root Mean Squared Error as it is generally preferred performance measure for regression tasks.

Steps followed for the same:

  1. Getting the data
  2. Quick analysis of data
  3. Creation of test set
  4. Visualizing data and observation
  5. Deriving new features from existing ones
  6. Data cleaning
  7. Creating transformation pipelines
  8. Model selection - trying different models
  9. Model evaluation using cross-validation
  10. Hyperparameter tuning using Grid Search
  11. Evaluation on test set
  12. Deployment [yet to be done]
Author:

Piyush Kumar