Tanzania_Water_Wells_Model: A Jupyter Notebook repository from PaulMuniu

Tanzania Water Wells Model.

Table of Contents.

Project Overview
Dataset
Exploratory Data Analysis
Data Preprocessing
Model Development
Model Evaluation
Final Model Recommendation
Conclusion
Technologies Used

Project Overview.

Stakeholder: WaterAid

WaterAid is an international non-governmental organization (NGO) based in the United Kingdom. The organization is dedicated exclusively to ensuring equitable access to safe water, sanitation and hygiene education for the world’s poorest communities. WaterAid has a profound impact on improving health, education, and economic opportunities by providing sustainable solutions for water and sanitation.

This project aims to predict the functionality of water points in Tanzania. By leveraging machine learning techniques, we seek to assist WaterAid Internatinal in prioritizing and tailoring interventions to improve water access and sustainability.

Dataset.

The dataset was provided by Taarifa and the Tanzanian Ministry of Water contains extensive information about various water points across Tanzania. The dataset used for this project includes various attributes of water points, such as geographical location, construction year, and water quality. The target variable is the status_group, which indicates whether a water point is functional, non-functional, or functional but needs repair. All features in this dataset are found here.

Exploratory Data Analysis.

Extensive exploratory data analysis (EDA) was conducted to understand the distribution and relationships of different features.

Distribution of the Functional and Non-Functional Water wells.

Data Preprocessing.

Data preprocessing steps included:

Handling missing values
Encoding categorical variables
Feature scaling
Feature selection

Model Development.

We developed and compared several machine learning models:

Dummy Classifier
Decision Tree
Random Forest
K-Nearest Neighbors (KNN)
XGBoost

Model Evaluation.

Models were evaluated using various metrics, including accuracy, precision, recall, F1 score, and ROC AUC score. The Random Forest classifier emerged as the top performer with the following metrics:

Accuracy: 80%
Precision: 80%
Recall: 80%
F1 Score: 80%
ROC AUC Score: 88%

Final Model Recommendation.

Based on the evaluation, we recommend the Random Forest classifier as the final model due to its high accuracy and robustness.

Conclusion.

In conclusion, the Random Forest classifier is the optimal choice for predicting water point functionality in Tanzania. This model will support WaterAid International in making data-driven decisions to enhance water access and sustainability in Tanzania.

Technologies Used.

Python
Pandas
Numpy
Matplotlib
Seaborn
Sklearn

“Nothing in life is to be feared; it is only to be understood. Now is the time to understand more, so that we may fear less.” - Marie Curie.

PaulMuniu/Tanzania_Water_Wells_Model