- Fork this repo 🍴
- Sign In to AWS Account and create S3 bucket (in N. Virginia) and some folders
Follow this guide if you don't know: How do I create an S3 Bucket?
Should look exactly like👇
- Go to IAM Console and create AWS Access Keys, store them in safe place
How do I set up an IAM user and sign in to the AWS Management Console using IAM credentials?
How do I create an access key for an existing IAM user?
Some tips:
- For beginners create Admin user with full access
- For advanced users create a user with only access to that bucket, follow this, How To Grant Access To Only One S3 Bucket Using AWS IAM Policy
- Install AWS CLI, I'm using WSL2 on Windows, so I did
python -m pip install --user awscli
to install as global package
For more detail instructions follow, https://github.com/aws/aws-cli
- Configure AWS credentials
$ aws configure
AWS Access Key ID: MYACCESSKEY
AWS Secret Access Key: MYSECRETKEY
Default region name [us-east-1]: us-east-1
Default output format [None]: json
- Download the raw dataset
a. Dataset: https://titanic-model.s3.amazonaws.com/raw_titanic.csv
b. Create a folder inside titanic_model
called data
,
following is project structure that should look like
.
├── notebooks
└── titanic_model
├── data
├── config
├── processing
└── trained_model_artifacts
5 directories
- Install packade dependencies and run it locally to verify if it works
- Python 3
- Conda [Optional, but recommended]
a. If you have conda, than install pipenv
via conda install pipenv
, if you don't just do pip install pipenv
b. To install dependencies do pipenv install
c. To activate virtual environment do pipenv shell
d. cd
into titanic_model
& do dvc remote add data
which adds data
folder so it can be tracked by DVC
NOTE: If you have any issues visit: https://dvc.org/doc/user-guide/external-dependencies
e. Run tox
to train ML model and generate reports, and pickled model saved in titanic_model/trained_model_artifacts
-
Checkout a branch and test out a different ML model via
git checkout -b random_forest
-
Add ML classifier of your choice to
titanic_model/pipeline.py
-
Add your AWS
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
to Github Secrets -
Go get a sip of ☕ while your model trains Once traininig is completed it should look like this