HP as around 275 laptops out of 1303. Whereas Dell and Lenovo has crossed the HP with 290 approx laptops out of 1303. We see that the least is with google, LG, Fujitsu, Huawei, etc
And we have the what kind of models are sold mostly Notebooks followed by Ultrabook and Gaming. The least models sold are netbook and workstation.
Then we will see average price for each laptop brand. It will give us the insight how the price of laptop will vary. According to the dataset, We can see the ticks at boxplot of HP the maximum (avg) selling price it goes to 65K - 70K (It can be more but the dataset does not contain that data) and average selling would be to 50K- 52K. For Apple the maximum (avg)selling price it goes to approx 1Lac (It can be more but the dataset does not contain that data) and average selling would be to 65K- 70K. So we can a lot variations of price, so we get to know what is the average variation of a company's laptop and what can be maximum price which a company can assign to its laptop.
In the above plot, we get to know that notebook does not have that much price variation. maybe beacuse the notebooks are used for general purpose so that it can be scalable to peoples.
In the plot, we observe that the most of the people buys laptops with screensize around 13 - 14 inches. While the laptops with screensize of 17inches are sold upto 3Lakhs or more. Also, most people buys laptop with screensize of 15.6 inches and we can see some scatter over the 15.6 so we can say that the data is almost right.
For the Screen Resolution column we have many types of Screen Resolutions out there as shown Touch Screen and Normal and IPS Panel are the 3 parts on basis of which we can segregate the things
Full HD 1920x1080 507 1366x768 281 IPS Panel Full HD 1920x1080 230 IPS Panel Full HD / Touchscreen 1920x1080 53 Full HD / Touchscreen 1920x1080 47 1600x900 23 Touchscreen 1366x768 16 Quad HD+ / Touchscreen 3200x1800 15 IPS Panel 4K Ultra HD 3840x2160 12 IPS Panel 4K Ultra HD / Touchscreen 3840x2160 11 4K Ultra HD / Touchscreen 3840x2160 10 Touchscreen 2560x1440 7 4K Ultra HD 3840x2160 7 IPS Panel 1366x768 7 IPS Panel Quad HD+ / Touchscreen 3200x1800 6 Touchscreen 2256x1504 6 IPS Panel Retina Display 2560x1600 6 IPS Panel Retina Display 2304x1440 6 IPS Panel Touchscreen 2560x1440 5 IPS Panel 2560x1440 4 IPS Panel Retina Display 2880x1800 4 1440x900 4 IPS Panel Touchscreen 1920x1200 4 2560x1440 3 1920x1080 3 IPS Panel Quad HD+ 2560x1440 3 IPS Panel Touchscreen 1366x768 3 Touchscreen 2400x1600 3 Quad HD+ 3200x1800 3 IPS Panel Full HD 2160x1440 2 IPS Panel Quad HD+ 3200x1800 2 IPS Panel Touchscreen / 4K Ultra HD 3840x2160 2 Touchscreen / Full HD 1920x1080 1 Touchscreen / Quad HD+ 3200x1800 1 Touchscreen / 4K Ultra HD 3840x2160 1 IPS Panel Full HD 1920x1200 1 IPS Panel Full HD 2560x1440 1 IPS Panel Retina Display 2736x1824 1 IPS Panel Touchscreen 2400x1600 1 IPS Panel Full HD 1366x768 1
So now will be creating a new col,touchscreen if the value is 1 that laptop is touch screen
So using countplot, we get to know that almost 190 laptops are touchscreen and rest of all laptops are not touchscreen.
The price for touchscreen laptops are the highest 80k or it can go more than 80k and average pricr is 70K.
So now will be creating a new col for ips panel as well if the value is 1 that laptop is touch screen
So using countplot, we get to know that almost 350 - 400 laptops are IPS and rest of all laptops are not IPS
From the correlation plot we observed that as the X_res and Y_res is increasing,the price of the laptop is also increasing,so X_res and Y_res
are positively correlated and they are giving much information,so that is the reason why i had splitted Resolution
column into X_res and Y_res
columns respectively.So to make things good,we can create a new column named PPI{pixels per inch},now as we saw from the correlation plot that the X_res and Y_res are having much collinearity,so why not combine them with Inches which is having less collinearity,so we will combine them as follows ↓,so here is the formula of how to calculate PPI {pixels per inch}.
So as we observe from the correlation data that the PPI is having good correlation,so we will be using that,as that is a combination of 3 features and that gives collective results of 3 columns,so we will drop Inches,X_res,Y_res as well
Now we will work on CPU
column,as that also has much text data and we need to process it efficiently as we may get good insights from them
Most common processors are made by intel right,so we will be clustering their processors into different categories like i5,i7,other,now other means the processors of intel which do not have i3,i5 or i7 attached to it,they're completely different so that's the reason i will clutter them into other and other category is AMD which is a different category in whole
So if we observe we need to extract the first 3 words of the CPU column,as the first 3 words of every row under the CPU col is the type of the CPU,so we will be using them as shown
We can see that more than 600 uses 8gb ram and other major quantity is 4gb ram and we find few people who prefer 16gb ram
We will seperate the Type
of memory and the value of it,just similar to the one which is done in the previous part
This part involves things which are needed to be done in steps,so here we do not have the memory as a complete we have it in different dimension as 128GB SSD + 1TB HDD
,so inorder to for it come in a same dimension we need to do some modifications which are done below as shown
Based on the correlation we observe that Hybrid and Flash Storage are almost negligible,so we can simply drop them off,where as HDD and SDD are having good correlation,we find that HDD has -ve relation with Price,and that's true,if the price of laptop is increasing there is more probability that the laptop is gonna use SDD instead of HDD and vice versa as well
Here as we are having less data regarding the laptops,its better that we focus on GPU brands instead focusing on the values which are present there beside them,we will focus on the brands
Step 1 would be to convert categorical values to Numerical Values Step 2 is like an object of the model
Random Forest model gives best score from the above all Models. So we will be using Random Forest Model for Web App.