Practical-Application-2

OVERVIEW

In this application, we will explore a dataset from kaggle. The original dataset contained information on 3 million used cars. The provided dataset contains information on 426K cars to ensure speed of processing. Our goal is to understand what factors make a car more or less expensive. As a result of our analysis, we should provide clear recommendations to our client -- a used car dealership -- as to what consumers value in a used car.

Objectives:

Predict the price of used cars, and what factors make a car more or less expensive.

Test Results:

The script runs 20 random samples with our prediction model (Linear Regression)
It looks the predicted values are not accurate
The models need additional tuning or the criteria of the selected features do not fully explain the model

Data issues and Model issues

There were many NaN
Upto now only known models have been used
introduction of hyper-parameters tuning might be needed
Current model use only 5 features for prediction
The Normalized dataset has not been used yet,
the data distribution is a skewed right distribution

Recommendation

Increase permutation feature importance
Tuning technics to be used in the regression models
Some advanced models have been used but the best results are from the simple regression model
The prediction models are at experimental stage, but can be productionised by converting it to python code api library with gui and interface api

Note: the notebook does not show plotley plot and piplene object in the github viewer, need to be load and run under jupyter

PlamenStilyianov/Practical-Application-2

Practical-Application-2