train_ml_with_github_actions: A Jupyter Notebook repository from hurshd0

Demo showcasing simple MLOps workflow

Follow below instructions to try out 👇

Fork this repo 🍴

Follow this guide if you don't know: How do I create an S3 Bucket?

Should look exactly like👇

Go to IAM Console and create AWS Access Keys, store them in safe place

How do I set up an IAM user and sign in to the AWS Management Console using IAM credentials?

How do I create an access key for an existing IAM user?

Some tips:

For beginners create Admin user with full access
For advanced users create a user with only access to that bucket, follow this, How To Grant Access To Only One S3 Bucket Using AWS IAM Policy

Install AWS CLI, I'm using WSL2 on Windows, so I did python -m pip install --user awscli to install as global package

For more detail instructions follow, https://github.com/aws/aws-cli

Configure AWS credentials

$ aws configure
AWS Access Key ID: MYACCESSKEY
AWS Secret Access Key: MYSECRETKEY
Default region name [us-east-1]: us-east-1
Default output format [None]: json

Download the raw dataset

a. Dataset: https://titanic-model.s3.amazonaws.com/raw_titanic.csv

b. Create a folder inside titanic_model called data,

following is project structure that should look like

.
├── notebooks
└── titanic_model
    ├── data
    ├── config
    ├── processing
    └── trained_model_artifacts

5 directories

Install packade dependencies and run it locally to verify if it works

Pre-requisites

Python 3
Conda [Optional, but recommended]

a. If you have conda, than install pipenv via conda install pipenv, if you don't just do pip install pipenv

b. To install dependencies do pipenv install

c. To activate virtual environment do pipenv shell

d. cd into titanic_model & do dvc remote add data which adds data folder so it can be tracked by DVC

NOTE: If you have any issues visit: https://dvc.org/doc/user-guide/external-dependencies

e. Run tox to train ML model and generate reports, and pickled model saved in titanic_model/trained_model_artifacts

Checkout a branch and test out a different ML model via git checkout -b random_forest
Add ML classifier of your choice to titanic_model/pipeline.py
Add your AWS AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to Github Secrets
Create a Pull Request to master
Go get a sip of ☕ while your model trains Once traininig is completed it should look like this

hurshd0/train_ml_with_github_actions

Demo showcasing simple MLOps workflow

Follow below instructions to try out 👇

Pre-requisites