/California-Housing-Prices

A clone of https://www.kaggle.com/armanjr/california-housing-prices

Primary LanguageJupyter Notebook

California-Housing-Prices

  • Define business object
  • Make sense of the data from a high level
  • Create the traning and test sets using proper sampling methods, e.g., random vs. stratified
  • Correlation analysis (pair-wise and attribute combinations)
  • Data cleaning (missing data, outliers, data errors)
  • Data transformation via pipelines (categorical text to number using one hot encoding, feature scaling via normalization/standardization, feature combinations)
  • Train and cross validate different models and select the most promising one (Linear Regression, Decision Tree, and Random Forest were tried in this tutorial)
  • Fine tune the model using trying different combinations of hyperparameters
  • Evaluate the model with best estimators in the test set
  • Launch, monitor, and refresh the model and system