This project aims to classify mushrooms as either edible or poisonous based on their physical characteristics using various machine learning and deep learning techniques. The project includes data preprocessing, exploratory data analysis, model training, evaluation, and interpretation using Explainable AI (XAI) methods.
- Aliyan Ahmed
- AbdurRehman Haroon
- Eesha Tariq
- Javeria Shahid
- Introduction
- Dataset Description
- Edibility Classification
- Data Preprocessing
- Feature Selection
- Model Training and Evaluation
- Explainable AI
- Deep Learning Model
- Conclusion and Future Work
- References
Mushrooms have a long history of use for food, medicine, and spiritual purposes. However, distinguishing between edible and poisonous mushrooms remains a challenge due to their vast diversity. This project aims to predict the edibility of mushrooms based on their physical traits.
The dataset contains 61,069 instances with 20 attributes each. It includes features such as cap diameter, shape, surface, color, gill features, stem attributes, veil properties, ring presence, spore print color, habitat, and season. The dataset is sourced from the UCIML repository.
Our primary objective is to build a classification model to predict whether a mushroom is edible or poisonous. The evaluation metrics include accuracy, precision, recall, and F-score, with a focus on recall to avoid mislabeling poisonous mushrooms as edible.
- Data Imputation: Missing values were replaced using the mean for numerical columns and mode for categorical columns.
- Outliers and Duplicates: Outliers were removed using the Interquartile Range (IQR) method, and duplicates were eliminated to ensure data quality.
- Class Imbalance: Addressed using the RandomOverSampler to balance the distribution of the target class.
- Filter Method (SelectKBest): Extracted the most informative features using the chi-square test.
- Wrapper Method (RFE): Implemented but found less reliable than SelectKBest.
Nine different machine learning models were trained and evaluated:
- Logistic Regression
- Decision Tree
- Random Forest
- Gradient Boosting
- Support Vector Machines
- K Nearest Neighbors
- Gaussian Naïve Bayes
- Linear Discriminant Analysis
- Quadratic Discriminant Analysis
The best-performing models were selected based on mean test accuracy and other classification metrics.
- SHAP: Used to interpret the impact of each feature on the model's predictions.
- LIME: Provided local explanations for individual predictions to improve model transparency.
A neural network was implemented with two hidden layers (128 and 64 neurons) and an output layer using the sigmoid activation function. The model was optimized using grid search and evaluated for its performance.
The project successfully demonstrated the classification of mushrooms using various machine learning techniques. Future work includes:
- HTML Parsing to expand the dataset.
- Enhancing image recognition capabilities.
- Developing an interactive web application.
- Deploying the model on mobile platforms.
- Addressing potential overfitting issues.
- Exploring the Role of Mushrooms Throughout History
- Edible Mushrooms: Attributes and Applications
- Mushroom Poisoning
- The Mushroom Hunter's Field Guide
- Acute Liver Injury From Mushroom Ingestion
- Demographic, Clinical, and Laboratory Findings of Mushroom-Poisoned Patients in Kermanshah
- Diversity of Edible Mushrooms in Pakistan
- UCIML Secondary Mushroom Dataset
- 7 Steps to Mastering Data Cleaning and Preprocessing Techniques
- imblearn RandomOverSampler
- Explainable AI - Understanding and Trusting Machine Learning Models
- Mushroom World
- A Review on Evaluation Metrics for Data
- Secondary Mushroom Dataset