The Ethiopian Medical Business Data Warehouse & Analytics Platform is a comprehensive data solution tailored to enhance the efficiency and efficacy of Ethiopia's healthcare and medical sectors. This initiative is dedicated to establishing a resilient and adaptable data warehouse that integrates information from diverse stakeholders such as medical institutions, pharmaceutical firms, insurance entities, and governmental bodies. The primary goal is to provide a unified platform for comprehensive data analysis, facilitating informed decision-making and strategic advancements in the healthcare landscape of Ethiopia.
- Data Scraping and Collection Pipeline
- Data Cleaning and Transformation
- Object Detection Using YOLO
- Exposing the Collected Data Using FastAPI
- Postman Collection
- Installation
- Usage
- Contributing
- License
Utilize the Telegram API or custom scripts to extract data from public Telegram channels related to Ethiopian medical businesses. Key channels include:
Collect images from specified Telegram channels for object detection:
For more details, see the data_scraping_and_cleaning.ipynb notebook.
- Remove duplicates
- Handle missing values
- Standardize formats
- Validate data
Set up DBT for data transformation and create models (SQL files) for data transformation:
pip install dbt
dbt init dbt_med
dbt run
Store cleaned data in a database.
For more details, see the data_scraping_and_cleaning.ipynb notebook.
Ensure necessary dependencies are installed:
pip install opencv-python
pip install torch torchvision
pip install tensorflow
git clone https://github.com/ultralytics/yolov5.git
cd yolov5
pip install -r requirements.txt
- Collect images from the specified Telegram channels.
- Use the pre-trained YOLO model to detect objects in the images.
- Extract data such as bounding box coordinates, confidence scores, and class labels.
- Store detection data in a database table.
For more details, see the yolo.ipynb notebook.
Install FastAPI and Uvicorn:
pip install fastapi uvicorn
Set up a basic project structure:
my_project/
├── main.py
├── database.py
├── models.py
├── schemas.py
└── crud.py
- In
database.py
, configure the database connection using SQLAlchemy.
- In
models.py
, define SQLAlchemy models for the database tables.
- In
schemas.py
, define Pydantic schemas for data validation and serialization.
- In
crud.py
, implement CRUD (Create, Read, Update, Delete) operations for the database.
- In
main.py
, define the API endpoints using FastAPI.
You can use the Postman API collection found in the link below:
To get started, follow these steps:
-
Clone the repository:
git clone https://github.com/Daniel-Andarge/AiML-ethiopian-medical-biz-datawarehouse.git cd AiML-ethiopian-medical-biz-datawarehouse
-
Create a virtual environment and activate it:
# Using virtualenv virtualenv venv source venv/bin/activate # Using conda conda create -n your-env python=3.x conda activate your-env
-
Install the required dependencies:
pip install -r requirements.txt
-
Run Data Scraping Scripts:
python extract_load_pipeline.py
-
Run DBT Models:
dbt run
-
Run Object Detection:
python detect.py --source data/telegram_images --save-txt --save-conf --project results --name run1
-
Start FastAPI Application:
uvicorn main:app --reload
Contributions are welcome. Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them.
- Push your branch to your forked repository.
- Create a pull request to the main repository.
This project is licensed under the MIT License.
Special thanks to the contributors and the open-source community for their support and resources.