/Concrete_Data_Analysis

This repository contains code for a linear regression model on the Concrete Data.

Primary LanguageJupyter Notebook

Concrete_Data_Analysis

This repository contains code for a linear regression model on the Concrete Data. The following libraries were used:

  • pandas
  • numpy
  • sklearn.model_selection.train_test_split
  • sklearn.preprocessing.StandardScaler
  • sklearn.linear_model.LinearRegression
  • sklearn.metrics.mean_squared_error
  • sklearn.metrics.mean_absolute_error
  • sklearn.metrics.r2_score
  • seaborn
  • matplotlib.pyplot # Dataset The dataset used for this model is the Concrete Data, which can be found in the Dataset folder in the repository.

    Preprocessing

    Before training the model, the dataset was preprocessed in the following way:

  • Checked for null values in the dataset using the isnull().sum() method
  • Checked for duplicated values in the dataset using the duplicated() method
  • Removed duplicated values using the drop_duplicates() method
  • Visualized the data using the pairplot() method from seaborn
  • Checked for correlation in the dataset using the corr() method
  • Visualized the correlation data using the heatmap() method from seaborn
  • Visualized the boxplot for all the features using the boxplot() method from pandas

    Training and Testing

    The data was split into training and testing sets using the train_test_split() method from sklearn. The testing set size was set to 30%.

    The StandardScaler() method from sklearn was used to standardize the data. The fit_transform() method was used for the training set, and the transform() method was used for the testing set.

    The linear regression model was then trained using the LinearRegression() method from sklearn.

    Model Evaluation

    The model was evaluated using the following metrics:

  • Mean Squared Error (MSE)
  • Square Root Mean Squared Error (RMSE)
  • Mean Absolute Error (MAE)
  • R2 Score
  • Adjusted R2 Score
    The performance metrics were calculated using the mean_squared_error(), sqrt() from numpy, mean_absolute_error(), r2_score(), and a custom formula for the adjusted R2 score.