/Generative-modelling-Boston-dataset

Generative models for creating synthetic data from Boston housing dataset

Primary LanguageJupyter Notebook

Generative-modelling-Boston-dataset

Generative models for creating synthetic data from Boston housing dataset.

Boston dataset is preprocessed in data_preparation.ipynb file.
Load preprocessed data from boston_dataset_data.mat file.

Abstract

The Boston Housing Dataset (https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) is small size dataset for benchmark machine learning algorithms.
Dataset contains 506 cases, each with 14 attributes (13 numerical/categorical predictive variables and 1 one target variable: median value of owner-occupied homes in $1000's).

Second and fourth column from predictors are deleted and target variable is joined to final dataset for generative modelling.
Shape of final dataset boston_dataset_data.mat is (506,12).

Load preprocessed data with:

boston_data = loadmat('boston_dataset_data')['boston_dataset_data']

Generative models included:

  • Gaussian mixture models

Dataset

Distributions of 12 variables used for generative modelling: image

Correlation
image

DBSCAN clustering analysis of preprocessed data
image