Goal : For this project, I'm going to build a machine learning models that could predict the property sales price for properties in NYC. I'm dividing my project into the following sections:
- EDA
- Data Preparation
- Modelling
Task 1: Read dataset, and perform basic data exploration. check duplicates, missing values etc.dealing with (missing values, duplicate entries, outliers, etc.)
For this part of the EDA I'm identifying the type of variables in the dataset. It is imporant for an EDA to know with what kind of variables you are dealing with. There are two types of variables:
- Categorical
- Numerical
- For this step of the project, I'm checking for missing values and the missigng correlation and the pattern of the missing values in the datasets.
-
Raise two questions that can be answered by performing data visualization.
-
Briefely mention why you think this question would be interesting to whom (who is your audience).
-
Think about the EDA principals.
Task 3: Feature Engineering, transfer (cateogorical features), how We select the important features.
-
If we would like to predict the house sale price.
-
Analyze the scale of each attribute and determine which ones you would transfer (e.g., cateogorical features).
-
Discuss how you plan to select important features.
-
Random Forest Model
-
Linear Regression Model
-
KNN Model
-
SVR Model
-
Decision Tree Regressor Model
-
Gradient Boosting Regressor Model
-
Ada Boost Regressor Model
-
XGB Regressor Model