/Iris_Flower

Implementing all ML models and feature selection techniques that can be used.

Primary LanguageHTML

Project: Iris Flower

Supervised Learning, Classification


Table Of Contents:


Description

About the project:

This is the most famous dataset in ML and best for beginners who wants to get there hands dirty with ML/Data Science. Having less features and observations of the Iris flowers, no missing values or outliers to deal with, this makes implementing ML models easier and simple.

What needs to be done:

Since the project is clean and small, we will use this to our advantage and get practice on how to perform data visualization with matplotlib and seaborn (Data Visualization Libraries), implement most used feature selection methods in ML/Data Science project, and apply all classification models on this dataset. This will give us practice and hands on experience on how and when to implement and which works best given the dataset.

Sources:

  • Creator: R.A. Fisher
  • Donor: Michael Marshall

Data

Files:

This project contains 1 file and 2 folders:

  • report.ipynb: This is the main file where I have performed my work on the project.
  • export/ : Folder containing HTML and PDF version file of notebook.
  • plots/ : Contains images of all the plots that are displayed in report.ipynb file.

Dataset file:

Associated Task Classification
Data Set Characteristics Multivariate
Attribute Characteristics Real
Number of Instances 150
Number of Attributes 4
Missing Values? No
Area Life

The data set contains 3 classes of 50 instances each, total 150 instances, where each class refers to a type of Iris plant. One class is linearly separable from the other 2 and the latter are not linearly separable from each other.

Predicting attribute: Class of Iris plant.

Attribute Information: We have 4 features in this dataset and a target variable class.

  • sepal length in cm.
  • sepal width in cm.
  • petal length in cm.
  • petal width in cm.
  • Class:
    • Iris Setosa
    • Iris Versicolour
    • Iris Virginica

Loading Project

Requirements:

This project was solved with the following versions of libraries installed:

Libraries\Language Use Version
Python Language Used for the project 3.7.0
NumPy For Scientific Computing 1.15.2
Pandas For Data Analysis 0.23.4
matplotlib For Visualization 3.0.0
seaborn For Visualization 0.9.0
scikit-learn ML Library for training & testing data 0.20.0

If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included in it.

You will also need to have software Jupyter Notebook installed to run and execute report.ipynb file. You can also use Jupyterlab too to run and execute, Jupyterlab is better version of Jupyter Notebook. Instructions to download Jupyterlab can be found here.

Execution:

In a terminal or command window, navigate to the top-level project directory Iris_Flower (that contains this README) and run one of the following commands:

ipython notebook report.ipynb

or

jupyter notebook report.ipynb

or if you have 'Jupyter Lab' installed

jupyter lab

This will open the Jupyter/iPython Notebook or Jupyterlab software and project file in your browser.