This is an project for final project submission of the DataTalksClub MLOps ZoomCamp 2024.
So given the informations from the different sensors found in an smoke detector can we come up with an machine learning model which can monitor the sensor's data and predict when the smoke alarm should detect smoke.
- initial experiment notebook
- setup environment(using pipenv)
- making scripts for training
- setup pre-commit hooks
- add Makefile
- using terraform for provisioning the cloud infrastructure(s3 bucket) involved in this project
- add persistence for localstack and postgresql containers
- using mlflow to track the training experiments and for Model Registry
- using workflow orchestrator(Prefect) to manage the training pipeline
- model deployment
- model monitoring
- write unit tests
- write integration tests
- add github action to run unit tests
- add github action to run integration tests
- It is recommended for windows users to run every command specified below in git bash.
- Always run
pipenv shell
to activate the pipenv environment in a new terminal before running any of the below commands
- install python 3.11.4 if not installed or install pyenv
- install docker if not installed
- install terraform if not installed
- install make if not installed
make install
make install
pre-commit install
I am using tflocal which is an wrapper for terraform which allows us to run terraform locally with localstack.
Note:- the localstack official docker image provides persistence only in pro version so instead of using it I am using this different localstack image which supports presistence for free. https://hub.docker.com/r/gresau/localstack-persist
This command is used to start an localstack container and initialize an s3 bucket as specified in terraform.
make create-infra
Note:- this command also runs the make create-infra
command internally so use can directly run this.
make start-services
it starts postgresql, mlflow, prefect and evidently services.
- postgresql: http://localhost:5432/
- mlflow: http://localhost:5000/
- prefect: http://localhost:4200/
- evidently: http://localhost:3000/
I have used prefect
instead of Mage
as it seems more reliable to me.
In prefect we use task
and flow
in place of block
and pipeline
in mage. And we need to deploy flow
so that we can run them on a schedule.
prefect deploy --all
we have made up an local process work pool which we are gonna use to run our deployments on, but before we run a deployment we will need to start an worker for our work pool local-pool
on a new terminal (always run pipenv shell before anything).
make local-work-pool-worker
open http://localhost:4200/deployments and run deployments. Also you can see the progress of the run on prefect dashboard.
the list of deployed flows :-
-
simple_model_training
: it is used to train a single model with the given numeric_cols. -
simple-model-feature-selection-search
: it is used to train multiple models using different set of features to compare which set of features gives us the best models, all the experiments are tracked and logged with mlflow so you can check their correponding runs on mlflow. -
model-evaluation
: it is used to evaluate an model registered by mlflow, so before running this you need to register the models usingsrc/mlflow_register_model.py
script.
This script is used to register all the best model which have an validation accuracy more than 77%. Also it doesn't registers the same run model twice even if you run this script twice.
python -m src.mlflow_register_model
I am using flask to serve the model, which is downloaded from the model registry of mlflow. Model version is configurable via MLFLOW_MODEL_VERSION
environment variable.
you can set the model version before running the deployment scripts, example :-
export MLFLOW_MODEL_VERSION="1"
make dev-deploy
Note:- this expects the Pipfile.lock file to be present in the root dir of the project so either run pipenv lock
to generate it or run make install
to install the dependencies and also generate the Pipfile.lock file.
make deploy
you can access the deployment website on http://localhost:8080/ in both the cases.
Open the url http://localhost:8080/ and there you can play with the model. or access it by doing an POST request to the endpoint http://localhost:8080/predict/ which expects json data as list of features of each example we want prediction for and returns an list of values either 0 or 1 where 1 means smoke is detected and 0 means no smoke is detected.
POST request to:
http://localhost:8080/predict/
with json payload :-
[
{
"Humidity[%]": 30,
"Temperature[C]": 20,
"eCO2[ppm]": 12
},
{
"Temperature[C]": 40,
"Humidity[%]": 100,
"eCO2[ppm]": 60
},
]
Json response :-
[0, 1]
make test
Note:- this expects the Pipfile.lock file to be present in the root dir of the project so either run pipenv lock
to generate it or run make install
to install the dependencies and also generate the Pipfile.lock file.
make integration-test
docker compose down
docker compose down {services names space seperated}
example to remove localstack and postgres containers:-
docker compose down localstack db
I have also push one container to docker hub you can check it out.
docker run --name smoke-detector -p 8080:8080 -e LOG_TO_DB_FLAG=false --rm -it anujpanthri/smoke-detector
docker run --name smoke-detector -p 8080:8080 -e LOG_TO_DB_FLAG=false -d anujpanthri/smoke-detector
docker stop smoke-detector
docker rm smoke-detector
docker rm smoke-detector --force