Concrete_Data_Analysis

This repository contains code for a linear regression model on the Concrete Data. The following libraries were used:

pandas

numpy

sklearn.model_selection.train_test_split

sklearn.preprocessing.StandardScaler

sklearn.linear_model.LinearRegression

sklearn.metrics.mean_squared_error

sklearn.metrics.mean_absolute_error

sklearn.metrics.r2_score

seaborn

matplotlib.pyplot # Dataset The dataset used for this model is the Concrete Data, which can be found in the Dataset folder in the repository.

Preprocessing

Before training the model, the dataset was preprocessed in the following way:

Checked for null values in the dataset using the isnull().sum() method

Checked for duplicated values in the dataset using the duplicated() method

Removed duplicated values using the drop_duplicates() method

Visualized the data using the pairplot() method from seaborn

Checked for correlation in the dataset using the corr() method

Visualized the correlation data using the heatmap() method from seaborn

Visualized the boxplot for all the features using the boxplot() method from pandas

Training and Testing

The data was split into training and testing sets using the train_test_split() method from sklearn. The testing set size was set to 30%.

The StandardScaler() method from sklearn was used to standardize the data. The fit_transform() method was used for the training set, and the transform() method was used for the testing set.

The linear regression model was then trained using the LinearRegression() method from sklearn.

Model Evaluation

The model was evaluated using the following metrics:

Mean Squared Error (MSE)

Square Root Mean Squared Error (RMSE)

Mean Absolute Error (MAE)

R2 Score

Adjusted R2 Score
The performance metrics were calculated using the mean_squared_error(), sqrt() from numpy, mean_absolute_error(), r2_score(), and a custom formula for the adjusted R2 score.

ranjith-acharya/Concrete_Data_Analysis

Concrete_Data_Analysis

Preprocessing

Training and Testing

Model Evaluation