This project aims to predict water quality using various classification algorithms and compare their performance metrics. The dataset used for this project can be obtained from Kaggle.
-
Data Cleaning: The dataset is preprocessed by removing rows with incorrect data and modifying data types to ensure data quality.
-
Data Analysis: Histograms are plotted for each column to gain insights into the distribution of data.
-
Classification Algorithms:
- Decision Tree Classifier
- Random Forest Classifier
- K Nearest Neighbor Classifier
- Gaussian Naive Bayes
- Voting Ensemble
- Stacking Ensemble
-
Performance Metrics: The accuracy, precision, recall, and F1-score are computed for each algorithm.
Based on the evaluation of performance metrics, the following observations were made:
- Random Forest Classifier performs well in terms of accuracy.
- Random Forest Classifier performs well in terms of precision.
- Decision Tree Classifier performs well in terms of recall.
- Decision Tree Classifier performs well in terms of F1-score.
To run the project, follow these steps:
-
Download the dataset from Kaggle.
-
Place the dataset in the appropriate directory.
-
Execute the provided code file
water_quality_prediction.py
.
The following Python libraries are required to run the project:
- pandas
- numpy
- matplotlib
- scikit-learn
Make sure to install these dependencies before running the project.
For any queries or suggestions, feel free to contact me at ahmadtalha963@gmail.com.
This project does not have a specific license. You are welcome to explore, modify, and distribute the code.