/mining-process-mlops-project

Capstone project of MLOpsa Zoomcamp

Primary LanguageJupyter Notebook

MLOps Zoomcamp Capstone Project

Capstone project for MLOps Zoomcamp.

Problem definition: Silica concentrate in mining process

The main goal is to use minig data to predict how much silica is in the ore concentrate after flotation. Data comes from one of the most important parts of a mining process: flotation plant. Concentrate of iron and silica in ore measures right before it is fed into the flotation plant, this data sampled every 1 hour. Other samples measured every 20 seconds, but there is problem with data stamps so this measures could sampled every hour too. Concentrate of iron after flotation couldn't be as feature and should be deleted because measures in lab after flotation. Exploratory data analysis is in notebooks. Dataset source: Quality Prediction in a Mining Process (Kaggle)

Repository structure

  • notebooks: Jupyter notebooks with EDA and preparation of data for upload to S3
  • train: Automated scripts to train model, register model and orchestration
  • web-service: Deployment of prediction service using Flask as web service
  • monitoring-service: Grafana to monitor evidently service
  • pyproject.toml: Configuration for code quality tools
  • .pre-commit-config.yaml: pre-commit hooks configuration

Every folder has his own desription in README.md

Train, choose and register best model

Full instruction exist in train directory. MLflow was used for experiment tracking and model registry. After a lot of experiments with various models, feature engineering and hyperparameters Ridge Regression as model with metric was choosed with hyperparameters alpha=50, tol=1e-09. Models compared in MLflow for metric: MAE, RMSE, MAPE and R2. Developed orchestration script using Prefect in prefect_flow.py. MLflow and Prefect deployed in Compute Cloud of Yandex Cloud As artifact storage: S3 Object storage in Yandex Cloud

Web service

Full instruction exist in web-service directory. Deploy the model easily with a couple of commands, the script will make all the checks and only then deploy the service.

Set environment variables in .env file.

Run command of Makefile to deploy app:

make setup
make deploy

As default get last Production ready model from model registry a run Docker container published in Container Registry of Yandex Cloud