Welcome to the Outlier Detection project! This repository contains Python code for detecting outliers in the famous Iris dataset using the Interquartile Range (IQR) method. Below, you'll find a detailed explanation of the project, including setup instructions, code details, and visualizations of the results.
This repository contains code for detecting outliers in the Iris dataset using the Interquartile Range (IQR) method. Outliers can significantly affect the performance of machine learning models, and detecting them is a crucial step in data preprocessing.
Outlier detection is an essential step in data preprocessing, as it helps in identifying and possibly removing anomalous data points that can skew the analysis. This project focuses on detecting outliers in the Iris dataset using the IQR method and visualizing the results with box plots.
The Iris dataset used in this project is publicly available and can be found here. The dataset consists of 150 samples from each of three species of Iris flowers (Iris setosa, Iris versicolor, and Iris virginica). Four features were measured from each sample: sepal length, sepal width, petal length, and petal width.
To get started, follow these instructions to set up your environment:
-
Clone the Repository:
git clone git@github.com:VaishnaviThakre/Outlier-Detection.git cd Outlier-Detection
-
Install Dependencies: Make sure you have Python and pip installed. Then, run:
pip install pandas numpy matplotlib seaborn
The output of the outlier detection process includes visualizations and summaries of outliers for each feature in the dataset. Below are some key observations:
- No outliers: All data points are within the expected range.
- Outliers identified:
- Three instances of Iris-setosa:
- Index 15: Sepal width of 4.4
- Index 32: Sepal width of 4.1
- Index 33: Sepal width of 4.2
- One instance of Iris-versicolor:
- Index 60: Sepal width of 2.0
- Three instances of Iris-setosa:
- No outliers: All data points are within the expected range.
- No outliers: All data points are within the expected range.
- Outliers in Sepal Width: Primary outliers are in the sepal width, with a few Iris-setosa instances having higher values and one Iris-versicolor instance having a lower value.
- No Outliers in Other Features: Sepal length, petal length, and petal width do not have significant outliers according to the IQR method.
We welcome contributions to improve this project. If you have suggestions for improvements or want to add new features, please open an issue or create a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
Thank you for using our outlier detection tool!!I hope it helps you in your data analysis and preprocessing tasks.