Goal of the project

The project is based on the well known Iris Dataset, with the goal to achieve an efficient classification of Iris flowers according to the 3 known Iris species:

  • Setosa
  • Versicolor
  • Virginica

Based on the attributes of the flowers :

  • Length of the petal
  • Petal width
  • Sepal length
  • Sepal width

Content of the project

The project consists of the following steps:

  1. Dataset analysis:
    • Distribution of the different attributes
    • Verification of the absence of missing values
  2. Visualization of the data
  3. Classification of the data
    • Using the Random Forrest algorithm
    • Using a decision tree

Software requirements

  • Python 3.8 and higher

As well as the following Python packages and modules:

  • scikit-learn
  • Numpy
  • Pandas
  • Matplotlib (Pyplot module)
  • Seaborn

Performances achieved

Algorithm Accuracy (%)
Random Forrest (3 arbres) 100
Decision Tree 97.78