here we have a dataset from Kaggle which was originally scraped from Wish E-Commerce Platform. It contains some input features like (price, retail_price, product_size, product_colour) and the output here is rating. so our job here to build the model to make us able to answering some quesitons like.
-What are the top selling products?
-Which are the most important features that help us predicting whether the product will succuess or nor?sold?
-what's the expected rating of product before listing it out into the site?
The first challenge here is that the data is not clean and we need a lot of preprocessing on it.
The second challenge here is that the data is imblanced data set (we have a lot 4 rating and only 11 rating of 2).
The impact of solving this problem that is you can make an educated guess about how likely people are to like your product without actually putting it on the market. In addition, by doing so, we may better determine under what circumstances a product will be highly rated, as well as the wish.com consumer base.
Data mining function
classification & prediction
first: Data Processing:
It’s a key step in Machine Learning project to ensure that data is transformed ,clean, and easy to use for analytical purpose. Below are some important/key steps:
1.Drop irrelevant and unnecessary features. 2.Check if there is any null values and replace them. 3.Create new features from existing features if needed 4.Clean categorical variables. second: convert categorical and string columns to numerical columns.
third: start buliding your models and choose the best one.