Car-Popularity-Prediction-using-Machine-Learning

Introduction

A car company has the data for all the cars that are present in the market. They are planning to introduce some new ones of their own, but first, they want to find out what would be the popularity of the new cars in the market based on each car's attributes.

We will provide you a dataset of cars along with the attributes of each car along with its popularity. Your task is to train a model that can predict the popularity of new cars based on the given attributes.

Dataset

You are given a training dataset, train.csv. The file is a comma separated file with useful information for this task:

  • train.csv contains the information about a car along with its popularity level. Each row provides information on each car. Information such as buying_price, maintenance_cost, number_of_doors, number_of_seats, etc. The definition of each attribute is as follows:

    buying_price: The buying_price denotes the buying price of the car, and it ranges from [1...4], where buying_price equal to 1 represents the lowest price while buying_price equal to 4 represents the highest price.

    maintenance_cost: The maintenance_cost denotes the maintenance cost of the car, and it ranges from [1...4], where maintenance_cost equal to 1 represents the lowest cost while maintenance_cost equal to 4 represents the highest cost.

    number_of_doors: The number_of_doors denotes the number of doors in the car, and it ranges from [2...5], where each value of number_of_doors represents the number of doors in the car.

    number_of_seats: The number_of_seats denotes the number of seats in the car, and it consists of [2, 4, 5], where each value of number_of_seats represents the number of seats in the car.

    luggage_boot_size: The luggage_boot_size denotes the luggage boot size, and it ranges from [1...3], where
    luggage_boot_size equal to 1 represents smallest luggage boot size while luggage_boot_size equal to 3 represents largest luggage boot size.

    safety_rating: The safety_rating denotes the safety rating of the car, and it ranges from [1...3], where safety_rating equal to 1 represents low safety while safety_rating equal to 3 represents high safety.

    popularity: The popularity denotes the popularity of the car, and it ranges from [1...4], where popularity equal to 1 represents an unacceptable car, popularity equal to 2 represents an acceptable car, popularity equal to 3 represents a good car, and popularity equal to 4 represents the best car.

  • a test set of car along with the above attributes excluding popularity is also provided, in test.csv. The goal is to predict the popularity of the car based on its attributes.

Steps that I followed :

Firstly I import the test.csv and train.csv using pandas.read_csv() Method

Then i tried to read the train data by using the functions such as desc(),info(),head(),teal()

Then i applied different machine learning models:

  • SVM
  • KNN
  • Logistic Regression

Then I trained the model using these classifier

And at last we predict the popularity of the test data.

I get the accuracy of 94.17%

Tool:

  • Jupyter
  • Pycharm