/Data_mining

Primary LanguageJupyter Notebook

Data_mining

First_Question: the dataset of this question is Datapreprocessing. In this question we are going to learn preprocessing. a)confront missing value b)convert categorical variables into numerical variables c)use feature scaling d)Multiple linear regression e)assume X as Total populaiton column and y as Number of people infected with the corona virus. Now Build regression model as below: y = ax^2 + bx + c

Second_Question: Congratulations! You just got some contract work with an Ecommerce company based in New York City that sells clothing online but they also have in-store style and clothing advice sessions. Customers come in to the store, have sessions/meetings with a personal stylist, then they can go home and order either on a mobile app or website for the clothes they want. The company is trying to decide whether to focus their efforts on their mobile app experience or their website. They’ve hired you on contract to help them figure it out! We’ll work with the Ecommerce Customers csv file from the company. It has Customer info, suchas Email, Address, and their color Avatar. Then it also has numerical value columns: Avg. Session Length: Average session of in-store style advice sessions. Time on App: Average time spent on App in minutes Time on Website: Average time spent on Website in minutes Length of Membership: How many years the customer has been a member. a)use pairplot and heatmap b)use Multiple linear regression and determine the MSE and RMSE c)use k-fold cross validation and determine test error d)find RMSE and determine overfit or underfit? e)Do you think the company should focus more on their mobile app or on their website?

Third_Question: a)confront missing value b)use KNN with k=1 and k=30 with Euclidean distance c)use k-fold cross validation and find the optimal amount of k