To Shroom or Not to Shroom

Project Overview

This project aims to classify mushrooms as either edible or poisonous based on their physical characteristics using various machine learning and deep learning techniques. The project includes data preprocessing, exploratory data analysis, model training, evaluation, and interpretation using Explainable AI (XAI) methods.

Team Members

Aliyan Ahmed
AbdurRehman Haroon
Eesha Tariq
Javeria Shahid

Introduction
Dataset Description
Edibility Classification
Data Preprocessing
Feature Selection
Model Training and Evaluation
Explainable AI
Deep Learning Model
Conclusion and Future Work
References

Introduction

Mushrooms have a long history of use for food, medicine, and spiritual purposes. However, distinguishing between edible and poisonous mushrooms remains a challenge due to their vast diversity. This project aims to predict the edibility of mushrooms based on their physical traits.

Dataset Description

The dataset contains 61,069 instances with 20 attributes each. It includes features such as cap diameter, shape, surface, color, gill features, stem attributes, veil properties, ring presence, spore print color, habitat, and season. The dataset is sourced from the UCIML repository.

Edibility Classification

Our primary objective is to build a classification model to predict whether a mushroom is edible or poisonous. The evaluation metrics include accuracy, precision, recall, and F-score, with a focus on recall to avoid mislabeling poisonous mushrooms as edible.

Data Preprocessing

Data Imputation: Missing values were replaced using the mean for numerical columns and mode for categorical columns.
Outliers and Duplicates: Outliers were removed using the Interquartile Range (IQR) method, and duplicates were eliminated to ensure data quality.
Class Imbalance: Addressed using the RandomOverSampler to balance the distribution of the target class.

Feature Selection

Filter Method (SelectKBest): Extracted the most informative features using the chi-square test.
Wrapper Method (RFE): Implemented but found less reliable than SelectKBest.

Model Training and Evaluation

Nine different machine learning models were trained and evaluated:

Logistic Regression
Decision Tree
Random Forest
Gradient Boosting
Support Vector Machines
K Nearest Neighbors
Gaussian Naïve Bayes
Linear Discriminant Analysis
Quadratic Discriminant Analysis

The best-performing models were selected based on mean test accuracy and other classification metrics.

Explainable AI

SHAP: Used to interpret the impact of each feature on the model's predictions.
LIME: Provided local explanations for individual predictions to improve model transparency.

Deep Learning Model

A neural network was implemented with two hidden layers (128 and 64 neurons) and an output layer using the sigmoid activation function. The model was optimized using grid search and evaluated for its performance.

Conclusion and Future Work

The project successfully demonstrated the classification of mushrooms using various machine learning techniques. Future work includes:

HTML Parsing to expand the dataset.
Enhancing image recognition capabilities.
Developing an interactive web application.
Deploying the model on mobile platforms.
Addressing potential overfitting issues.

j3kylls/Optimizing-the-Parameters-of-Advanced-Machine-Learning-Deep-Learning-Models