Diabetes prediction

Description

This project is a machine learning workshop organized by SHE CODE. It aims to introduce participants to the fundamentals of machine learning and provide hands-on experience with various ML algorithms and techniques. I learned two new features: YData Profiling, used in data analysis, and AutoGluon, used for model prediction.

Installation
Usage

Installation

To get started with this project, follow these steps:

Clone the repository: git clone https://github.com/Bbrnn/SHE-CODE-ML-WORKSHOP.git

Usage

Exploratory Data Analysis with YData Profiling

One of the key components of this workshop is the use of YData Profiling for exploratory data analysis. YData Profiling is a powerful tool that provides comprehensive insights into your dataset, helping you understand its structure, identify missing values, outliers, and much more. To use YData Profiling, follow these steps:

Install YData Profiling: pip install ydata_profiling
Import the library in your Python script: from ydata_profiling import ProfileReport
Load your dataset using pandas: import pandas as pd; data = pd.read_csv('path/to/your/dataset.csv')
Generate the profile report: profile = ProfileReport(data); profile.to_file("your_report.html")

AutoML with AutoGluon

Another important aspect of this workshop is the use of AutoGluon for automated machine learning (AutoML). AutoGluon is a powerful AutoML toolkit that simplifies the process of training and tuning machine learning models. To use AutoGluon, follow these steps:

Install AutoGluon: pip install autogluon
Import the library in your Python script: from autogluon.tabular import TabularPredictor
Load your dataset: train_data = TabularPredictor.load_csv('path/to/your/train_data.csv')
Define the target variable: label = 'target_column_name'
Train the model: predictor = TabularPredictor(label=label).fit(train_data)
Make predictions: test_data = TabularPredictor.load_csv('path/to/your/test_data.csv')
predictions = predictor.predict(test_data)

Remember to replace 'path/to/your/dataset.csv', 'path/to/your/train_data.csv', and 'path/to/your/test_data.csv' with the actual paths to your dataset files.