/EDA-and-prediction-models

Exploratory Data Analysis and Prediction models on Plant species Dataset

Primary LanguageJupyter NotebookMIT LicenseMIT

Objective:

Congratulations,

We have landed on mars and we are trying to colonize it. But the main problem we are facing is which plants we can grow at different geographical regions so that maximum number of plants would survive.

Fortunately we have the data from earth to give us insights.

Please help us in predicting which kind of plant we can grow at different locations depending on the environment.

It’s time to save humanity which is at the brink of extinction.

You are our only hope for survival.

Problem Statement

  1. Create a report on key insights derived from Exploratory Data Analysis

  2. Create a multi-class prediction model to predict the species of plant which will survive in the neighborhood of a given environment.

  3. Create key segments for all the plants (train + test) based on the average sunlight

received throughout the day and their distance from waterbody to identify which

segments of plants are getting enough sunlight and water vs which ones are not. This will help in mobilizing resources to track growth of trees appropriately

Key Requirements Directions - Hello Challengers,

Required sections in the Jupyter Notebook -

  1. Exploratory Data Analysis

  2. Data Preprocessing

2A. handling outliers ( imputation,Removal )

  1. Data Engineering

  2. Data Preparation for Predictive Modeling

  3. Classification Model Predictions (at least 3 different predictive models) with hyperparameter tuning.

  4. Comparison of model using performance KPIs, Training & Testing Time

  5. Final predictive model recommendation

Dataset

  1. Data size (test and train) : (116203 * 13 and 464809 *13)

  2. Target Variable : Plant_Type

  3. Data dictionary : Shared separately