This repository provides code for serving models via an API using FastAPI and MLFlow. A distinctive feature is that the API can start even if a model is not yet available. The system will then periodically check for the availability of a new production model and load it once ready.
The code continuously monitors a status indicating the availability of a new production model. A status change, signaling that the model is ready to be loaded, is triggered by a GET request to a specific URL.
To prepare your environment, you can use either conda or miniconda:
> conda env create -n <env_name>
> conda activate <env_name>
> pip install -r requirements.txt
mlflow server -p 5001 --host 0.0.0.0
gunicorn app:app -k uvicorn.workers.UvicornWorker -b 0.0.0.0:5002 --timeout 120
Upon initialization, you may notice a warning about a missing model:
[2023-10-20 07:52:24 +0200] [9776] [INFO] Waiting for application startup.
(WARNING): app: model is not available!
[2023-10-20 07:52:24 +0200] [9776] [INFO] Application startup complete.
The code checks for model once and while there is a loop checking status of a model every 10 seconds, the status is set to True to model so we don't try to load the model until we are sure MLFlow server can serve a model. The system checks the model's status periodically. While in monitoring mode, the system will not attempt to load a model unless it's certain the MLFlow server can serve it.
Sending a GET request to localhost:5002/reload will change the status, prompting the system to actively query the MLFlow server until a model becomes available.
Repeated logs may appear as:This will result in logs showing
(WARNING): app: model is not available!
(WARNING): app: model is not available!
...