This Streamlit web application offers a comprehensive exploration of outlier detection techniques applied to a dataset containing age data. Below are the key features and functionalities of the application:
-
Dashboard: The application includes a sidebar dashboard that provides easy access to different sections of the app.
-
Normal Distribution & Z-scores:
- Visualizes the probability density function (PDF) of age data and highlights outliers using Z-scores.
- Displays a box plot of age data and identifies outlier values.
-
Isolation Forest:
- Applies the Isolation Forest algorithm to detect anomalies in the age data.
- Visualizes outliers using scatter plots and box plots.
-
Local Outlier Factor (LOF):
- Utilizes the LOF algorithm to identify local outliers in the age data.
- Presents outliers visually through scatter plots and box plots.
-
Percentile-Based Method:
- Identifies outliers based on their position in a sorted list using top and bottom percentiles.
- Displays top and bottom percentile outliers in separate tables.
-
Winsorization:
- Implements Winsorization to transform the data by limiting extreme values (outliers) to a specified percentile.
- Presents both original and winsorized age data in a single table.
- Streamlit
- Pandas
- Plotly
- NumPy
- Scikit-learn
To run the application locally, follow these steps:
- Make sure you have Python installed on your system.
- Install the required dependencies using pip:
- Navigate to the directory containing
app.py
in your terminal. - Run the following command:
- The application will open in your default web browser, allowing you to explore and experiment with different outlier detection techniques.
- Probability Density Function (PDF) plot
- Box plot
- Scatter plot
- Table (for displaying outlier values)
Contributions to the project are welcome! If you encounter any issues or have suggestions for improvement, please feel free to open an issue or submit a pull request on GitHub.
Special thanks to Streamlit for providing an intuitive framework for building interactive web applications with Python.