Predicting Water Quality using Machine Learning

This project aims to predict water quality using various classification algorithms and compare their performance metrics. The dataset used for this project can be obtained from Kaggle.

Project Overview

Data Cleaning: The dataset is preprocessed by removing rows with incorrect data and modifying data types to ensure data quality.
Data Analysis: Histograms are plotted for each column to gain insights into the distribution of data.
Classification Algorithms:
- Decision Tree Classifier
- Random Forest Classifier
- K Nearest Neighbor Classifier
- Gaussian Naive Bayes
- Voting Ensemble
- Stacking Ensemble
Performance Metrics: The accuracy, precision, recall, and F1-score are computed for each algorithm.

Results

Based on the evaluation of performance metrics, the following observations were made:

Random Forest Classifier performs well in terms of accuracy.
Random Forest Classifier performs well in terms of precision.
Decision Tree Classifier performs well in terms of recall.
Decision Tree Classifier performs well in terms of F1-score.

Usage

To run the project, follow these steps:

Download the dataset from Kaggle.
Place the dataset in the appropriate directory.
Execute the provided code file water_quality_prediction.py.

Dependencies

The following Python libraries are required to run the project:

pandas
numpy
matplotlib
scikit-learn

Make sure to install these dependencies before running the project.

Contact

For any queries or suggestions, feel free to contact me at ahmadtalha963@gmail.com.

Note

This project does not have a specific license. You are welcome to explore, modify, and distribute the code.