/Cross-Validation

Cross-validation for evaluating a linear regression model

Primary LanguageJupyter Notebook

This code demonstrates the implementation of cross-validation for evaluating a linear regression model using the Boston Housing dataset. The purpose of this code is to assess the performance of a linear regression model and visualize the distribution of cross-validation scores.

Importing Libraries

The code begins by importing necessary libraries including pandas, numpy, matplotlib.pyplot, and required modules from scikit-learn.

Loading the Dataset

The Boston Housing dataset is loaded into a pandas DataFrame.

Data Preparation

The independent variables (features) and dependent variable (target) are extracted from the dataset.

Linear Regression Model Initialization

A Linear Regression model is initialized.

Cross-Validation

Cross-validation is performed using the cross_val_score function from scikit-learn. It calculates the negative mean squared error (neg_mean_squared_error) for each fold of the data. The - sign is used to reverse the sign of the scores because scikit-learn conventionally maximizes scores, but we want to minimize mean squared error.

Root Mean Squared Error (RMSE) Calculation

The mean squared errors obtained from cross-validation are transformed into root mean squared errors (RMSE) using numpy's square root function.

Visualization

A histogram of the cross-validation scores (mean squared errors) is plotted using matplotlib.

Visualizing Cross-Validation Scores

This histogram visually encapsulates the cross-validated MSE distribution, enabling a swift assessment of model consistency, stability, and generalization capability. Its succinct portrayal serves as a vital tool for understanding your model's efficacy across diverse dataset partitions.