This repository contains a Python script that uses machine learning to predict the time a car will spend on the test bench based on various features, as part of the Mercedes-Benz Greener Manufacturing challenge on Kaggle.
- Clone this repository to your local machine.
- Install the necessary Python packages. This project requires
numpy
,pandas
,scikit-learn
. You can install these with pip:
pip install numpy pandas scikit-learn
The data used for this project is provided by Mercedes-Benz and contains features related to a car and the target variable is the time (in seconds) that the car spends on the test bench. The data consists of a training set and a test set.
The data is preprocessed by:
- Dropping the outliers from the training set.
- Dividing the data into features (X) and the target variable (y).
- Splitting the training data into a training set and a validation set.
- Performing ordinal encoding on the categorical features.
- Handling missing values by imputing the median.
- Scaling the features using Standard Scaler.
- Selecting the top features using mutual information regression.
The model used for this project is a Linear Regression model from the scikit-learn
library. The model is trained on the preprocessed training set, and it is used to predict the target variable on the training set, validation set, and the test set.
The performance of the model is evaluated using the R squared metric.
The predictions made by the model on the test set are saved to a CSV file for submission to the Kaggle competition.
To use this model, run the included Python script. The script includes code for importing the data, preprocessing the data, training the model, making predictions, and exporting the predictions to a CSV file.
This project is licensed under the MIT License - see the LICENSE file for details.