/Optimizing-the-Parameters-of-Advanced-Machine-Learning-Deep-Learning-Models

Classifying mushrooms as edible or inedible using ML techniques & optimizing the parameters

Primary LanguageJupyter Notebook

To Shroom or Not to Shroom

Project Overview

This project aims to classify mushrooms as either edible or poisonous based on their physical characteristics using various machine learning and deep learning techniques. The project includes data preprocessing, exploratory data analysis, model training, evaluation, and interpretation using Explainable AI (XAI) methods.

Team Members

  • Aliyan Ahmed
  • AbdurRehman Haroon
  • Eesha Tariq
  • Javeria Shahid

Table of Contents

Introduction

Mushrooms have a long history of use for food, medicine, and spiritual purposes. However, distinguishing between edible and poisonous mushrooms remains a challenge due to their vast diversity. This project aims to predict the edibility of mushrooms based on their physical traits.

Dataset Description

The dataset contains 61,069 instances with 20 attributes each. It includes features such as cap diameter, shape, surface, color, gill features, stem attributes, veil properties, ring presence, spore print color, habitat, and season. The dataset is sourced from the UCIML repository.

Edibility Classification

Our primary objective is to build a classification model to predict whether a mushroom is edible or poisonous. The evaluation metrics include accuracy, precision, recall, and F-score, with a focus on recall to avoid mislabeling poisonous mushrooms as edible.

Data Preprocessing

  • Data Imputation: Missing values were replaced using the mean for numerical columns and mode for categorical columns.
  • Outliers and Duplicates: Outliers were removed using the Interquartile Range (IQR) method, and duplicates were eliminated to ensure data quality.
  • Class Imbalance: Addressed using the RandomOverSampler to balance the distribution of the target class.

Feature Selection

  • Filter Method (SelectKBest): Extracted the most informative features using the chi-square test.
  • Wrapper Method (RFE): Implemented but found less reliable than SelectKBest.

Model Training and Evaluation

Nine different machine learning models were trained and evaluated:

  1. Logistic Regression
  2. Decision Tree
  3. Random Forest
  4. Gradient Boosting
  5. Support Vector Machines
  6. K Nearest Neighbors
  7. Gaussian Naïve Bayes
  8. Linear Discriminant Analysis
  9. Quadratic Discriminant Analysis

The best-performing models were selected based on mean test accuracy and other classification metrics.

Explainable AI

  • SHAP: Used to interpret the impact of each feature on the model's predictions.
  • LIME: Provided local explanations for individual predictions to improve model transparency.

Deep Learning Model

A neural network was implemented with two hidden layers (128 and 64 neurons) and an output layer using the sigmoid activation function. The model was optimized using grid search and evaluated for its performance.

Conclusion and Future Work

The project successfully demonstrated the classification of mushrooms using various machine learning techniques. Future work includes:

  • HTML Parsing to expand the dataset.
  • Enhancing image recognition capabilities.
  • Developing an interactive web application.
  • Deploying the model on mobile platforms.
  • Addressing potential overfitting issues.

References

  1. Exploring the Role of Mushrooms Throughout History
  2. Edible Mushrooms: Attributes and Applications
  3. Mushroom Poisoning
  4. The Mushroom Hunter's Field Guide
  5. Acute Liver Injury From Mushroom Ingestion
  6. Demographic, Clinical, and Laboratory Findings of Mushroom-Poisoned Patients in Kermanshah
  7. Diversity of Edible Mushrooms in Pakistan
  8. UCIML Secondary Mushroom Dataset
  9. 7 Steps to Mastering Data Cleaning and Preprocessing Techniques
  10. imblearn RandomOverSampler
  11. Explainable AI - Understanding and Trusting Machine Learning Models
  12. Mushroom World
  13. A Review on Evaluation Metrics for Data
  14. Secondary Mushroom Dataset