EDA-Analysis

TEAM PSEUDO

1) Kuntal Gorai PES2UG19CS198

2) S Mahammad Aasheesh PES2UG19CS342

3) S V S C Santosh PES2UG19CS346

4) Venkata Krishna Arjun Vupalla PES2UG19CS451

Dataset:-

Dataset link: https://www.kaggle.com/hugodarwood/epirecipes?select=epi_r.csv
Size of Dataset:- 17736 rows and 680 columns

Step1:- EDA & Preprocessing

By:- Kuntal Gorai and Svsc Santosh
This step involves cleaning the data by : -
i)Removing outliers
ii)Replacing Null Values
iii)Removing Duplicate records
Also includes the use of PCA to reduce multicollinearity between attributes in a datamodel
Visualising the data with use of graphs
Reducing the number of columns in dataset having 680 columns to around 6 columns using different approaches
Keeping data ready to be used up by the model for training

Step 2:- Training the models and using them for further prediction

By:- Venkata Krishna Arjun Vupalla and S Mahammad Aasheesh
This step Includes:-
1)Implementation of the three models:- a)Multiple Linear Regression
b)Support Vector Machines
c)Decision Trees
2)Training and testing models with data.
3)applying model on test data
4)obtaining model metrics
5)comparing which model is the best