This project focuses on predicting diabetes using machine learning algorithms, particularly with sklearn
and pandas
. The dataset used for this project is sourced from Kaggle. Below is an overview of the tools and libraries used, as well as the model-building process.
The dataset for this project can be found on Kaggle: Diabetes Prediction Dataset.
You can view the notebook walkthrough here: Diabetes Prediction Dataset Notebook.
-
Data Preprocessing:
- Using
pandas
to load and manipulate the data. - Cleaning and handling missing values in the dataset.
- Using
-
Feature Selection:
- Using
sklearn
to select features that contribute the most to the prediction outcome.
- Using
-
Model Building:
- Several models are built using
sklearn
, including decision trees and other classifiers. - Performance evaluation is conducted using metrics such as accuracy, precision, recall, and F1 score.
- Several models are built using
Make sure to install the following libraries:
pip install pandas
pip install scikit-learn