My final project for MLOps Zoomcamp
The goal of this project is to develop and build a MLOps pipeline to build and deploy a predictive model to determine the edibility of mushrooms based on their characteristics.
The dataset used in this project has been downloaded from Kaggle. This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family Mushroom drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one.
- Poetry — Python depedency manager
- Pyenv — Python version manager
- Prefect — Workflow orchestrator
- MLFlow — Experiment tracker and model register
- FastAPI — Web API
- dotenv — environment variable loader
- pre-commit — pre-commit hooks
- AWS — Cloud service
- Docker — Containerization
- htmx - Better html interactivity.
To change the default behaviour or use a cloud server,
copy .env.example
to .env
with
cp .env.example .env
And change the default values to your needs.
It is possible to build the image with docker compose
or docker build
To build and run the image run
docker compose up
To build the Docker Image run
make build
To launch the application run
docker run -it --rm -p 8000:8000 mushroom-classification
The application works on POST requests, to send a request with CURL:
curl -X 'POST' \
'http://127.0.0.1:8000/api/predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"cap_shape": "x",
"cap_surface": "s",
"cap_color": "n",
"bruises": "t",
"odor": "a",
"gill_attachment": "f",
"gill_spacing": "c",
"gill_size": "n",
"gill_color": "b",
"stalk_shape": "e",
"stalk_root": "e",
"stalk_surface_above_ring": "f",
"stalk_surface_below_ring": "f",
"stalk_color_above_ring": "b",
"stalk_color_below_ring": "b",
"veil_type": "p",
"veil_color": "n",
"ring_number": "n",
"ring_type": "e",
"spore_print_color": "k",
"population": "a",
"habitat": "g"
}
'
The features and its possible values to be used in the API can be seen in docs/data.md.
The response object is a json object with the probability of the mushroom be poisonous, the response for the object above is
{"poisonous-probability":0.0}
You can also navigate to the url http://127.0.0.1:8000 and select the mushroom characteristics.
The page has a submit button, which return the probability of the mushroom with the given characteristics be poisonous.
Activate environment:
# if using poetry
poetry shell
# if using venv
source venv/bin/activate
-
Install with poetry:
poetry install
-
Install with pip
Activate the environment and run:
pip install .
Set prefect api to local:
prefect config set PREFECT_API_URL="http://127.0.0.1:4200/api"
Start prefect server:
prefect server start
Start mlflow server in another window (also reactivate the python environment):
mlflow server --backend-store-uri sqlite:///mlflow.db
Train model:
python src/train.py --input-path data/mushrooms.csv
Start web-service:
uvicorn src.api:app --reload
- Add a monitoring service
- Create a Frontend for the API
- Implement IaC
- Use CI/CD
- Create tests
The prediction model was created solely with the purpose in create a MLOps pipeline and is not advisable to use the deployed model with unknown mushrooms.