/AiML-ethiopian-medical-biz-datawarehouse

The Ethiopian Medical Business Data Warehouse & Analytics Platform is a comprehensive data solution tailored to enhance the efficiency and efficacy of Ethiopia's healthcare and medical sectors.

Primary LanguageJupyter NotebookMIT LicenseMIT

Ethiopian Medical Business Data Warehouse & Pipeline

The Ethiopian Medical Business Data Warehouse & Analytics Platform is a comprehensive data solution tailored to enhance the efficiency and efficacy of Ethiopia's healthcare and medical sectors. This initiative is dedicated to establishing a resilient and adaptable data warehouse that integrates information from diverse stakeholders such as medical institutions, pharmaceutical firms, insurance entities, and governmental bodies. The primary goal is to provide a unified platform for comprehensive data analysis, facilitating informed decision-making and strategic advancements in the healthcare landscape of Ethiopia.

Table of Contents

  1. Data Scraping and Collection Pipeline
  2. Data Cleaning and Transformation
  3. Object Detection Using YOLO
  4. Exposing the Collected Data Using FastAPI
  5. Postman Collection
  6. Installation
  7. Usage
  8. Contributing
  9. License

Data Scraping and Collection Pipeline

Telegram Scraping

Utilize the Telegram API or custom scripts to extract data from public Telegram channels related to Ethiopian medical businesses. Key channels include:

scraped

Image Scraping

Collect images from specified Telegram channels for object detection:

For more details, see the data_scraping_and_cleaning.ipynb notebook.

Data Cleaning and Transformation

Data Cleaning

  • Remove duplicates
  • Handle missing values
  • Standardize formats
  • Validate data

Data Cleaning Models DBT Doc

Set up DBT for data transformation and create models (SQL files) for data transformation:

pip install dbt
dbt init dbt_med
dbt run

dbt

Storing Cleaned Data

Store cleaned data in a database.

dbt_db

Fact Table in PostgreSQL Database

dbt_db

For more details, see the data_scraping_and_cleaning.ipynb notebook.

Object Detection Using YOLO

Setting Up the Environment

Ensure necessary dependencies are installed:

pip install opencv-python
pip install torch torchvision
pip install tensorflow

Downloading the YOLO Model

git clone https://github.com/ultralytics/yolov5.git
cd yolov5
pip install -r requirements.txt

Preparing the Data

  • Collect images from the specified Telegram channels.
  • Use the pre-trained YOLO model to detect objects in the images.

Processing the Detection Results

  • Extract data such as bounding box coordinates, confidence scores, and class labels.
  • Store detection data in a database table.

yolo

For more details, see the yolo.ipynb notebook.

Exposing the Collected Data Using FastAPI

Setting Up the Environment

Install FastAPI and Uvicorn:

pip install fastapi uvicorn

Create a FastAPI Application

Set up a basic project structure:

my_project/
├── main.py
├── database.py
├── models.py
├── schemas.py
└── crud.py

Database Configuration

  • In database.py, configure the database connection using SQLAlchemy.

Creating Data Models

  • In models.py, define SQLAlchemy models for the database tables.

Creating Pydantic Schemas

  • In schemas.py, define Pydantic schemas for data validation and serialization.

CRUD Operations

  • In crud.py, implement CRUD (Create, Read, Update, Delete) operations for the database.

Creating API Endpoints

  • In main.py, define the API endpoints using FastAPI.

crud

Get All Telegram Data

get get

Get All YOLO Detection Results

getyolo

Postman Collection

You can use the Postman API collection found in the link below:

Postman collection link

Installation

To get started, follow these steps:

  1. Clone the repository:

    git clone https://github.com/Daniel-Andarge/AiML-ethiopian-medical-biz-datawarehouse.git
    cd AiML-ethiopian-medical-biz-datawarehouse
  2. Create a virtual environment and activate it:

    # Using virtualenv
    virtualenv venv
    source venv/bin/activate
    
    # Using conda
    conda create -n your-env python=3.x
    conda activate your-env
  3. Install the required dependencies:

    pip install -r requirements.txt

Usage

  1. Run Data Scraping Scripts:

    python extract_load_pipeline.py
  2. Run DBT Models:

    dbt run
  3. Run Object Detection:

    python detect.py --source data/telegram_images --save-txt --save-conf --project results --name run1
  4. Start FastAPI Application:

    uvicorn main:app --reload

Contributing

Contributions are welcome. Please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them.
  4. Push your branch to your forked repository.
  5. Create a pull request to the main repository.

License

This project is licensed under the MIT License.

Acknowledgments

Special thanks to the contributors and the open-source community for their support and resources.