This is project for the MLOps ZoomCamp course here sponsored by DataTalks.Club
This is a simple end-to-end mlops project which takes data from capital bikeshare and transforms it with machine learning pipelines from training, model tracking and experimenting with mlflow, ochestration with prefect as workflow tool to deploying the model as a web service.
The project runs locally and uses AWS S3 buckets to store model artifacts during model tracking and experimenting with mlflow.
The chosen dataset for this project is the Capital Bikeshare Data
In the future I hope to improve the project by having the entire infrastructure moved to cloud using AWS cloud(managing the infrastructure with iac tools such as terraform), have model deployment as either batch or streaming with AWS lambda and kinesis streams, a comprehesive model monitoring.
Clone the project from the repository
git clone https://github.com/PatrickCmd/mlops-project.git
Change to mlops-project directory
cd mlops-project
Setup and install project dependencies
make setup
Add your current directory to python path
export PYTHONPATH="${PYTHONPATH}:${PWD}"
In a new terminal window or tab run the command below to start prefect orion server
prefect orion start
The mlflow points to S3 bucket for storing model artifacts and uses sqlite database as the backend end store
Create an S3 bucket and export the bucket name as an environment variable as shown below
In a new terminal window or tab run the following commands below
export S3_BUCKET_NAME=bucket_name
Start the mlflow server
mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root s3://${S3_BUCKET_NAME} --artifacts-destination s3://${S3_BUCKET_NAME}
python main.py --train_file 202204-capitalbikeshare-tripdata.zip --valid_file 202205-capitalbikeshare-tripdata.zip
python stage.py --tracking_uri http://127.0.0.1:5000 --experiment_name valid_experiment_name
prefect deployment create deployments.py
Create work queues
prefect work-queue create -t "ml-training" ml-training-queue
prefect work-queue create -t "ml-staging" ml-staging-queue
Run deployments locally to schedule pipeline flows
prefect deployment run mlflow-training/deploy-mlflow-training
prefect deployment run mlflow-staging/deploy-mlflow-staging
Change to webservice
directory and follow the instructions here