/MLOps

Tracked documentation on MLOps

Primary LanguagePython


Logo

MLOps

A continuous integration and deployment framework for healthcare AI projects
Explore the docs »

View repo · Report Bug · Request Feature

Table of Contents

  1. About The Project
  2. Getting Started On Local Hardware
  3. Overview
  4. Bringing it all together: hyper-parameter tuning
  5. Roadmap
  6. Contributing
  7. Contact
  8. Acknowledgements

About The Project

This project aims to build an effective MLOps framework for the development of AI models in a healthcare setting.

If you want to get straight to it with an end to end example, see the hyper-parameter tuning tutorial.

Open source components

  • DVC Data version control
  • MLflow Open source platform to manage the ML lifecycle
  • MONAI PyTorch-based framework for deep learning in healthcare imaging
  • MINIO High performance object storage suite
  • NGINX Reverse proxy server

It's not essential to have a complete understanding of all of these, but a high-level understanding of MLflow and DVC in particular will be useful!

Getting Started On Local Hardware

The production version of this project is intended to run on a dedicated remote machine on an isolated network. However, it is simple to set up a local copy to get an understanding of the framework.

Prerequisites

First follow the instructions to install Docker and docker-compose. A basic understanding of how docker and docker-compose work is recommended, available below.

Check docker and docker-compose are working by calling passing the help argument on the command line. If the help information is not returned, or an error is given revist the docker installation docs.

docker --help
docker-compose --help

Installation

  1. Clone the repository
    git clone https://github.com/GSTT-CSC/MLOps.git
    cd MLOps

The server can be configured by modifying the environment file found at /mlflow_server/.env. The environment variable provided are given as an example, and should not be used for a production deployment.

  1. Navigate to the cloned code repository and start the server. Any docker images that are not present on your local system will be pulled from dockerhub (which might take a while).
    cd mlflow_server
    docker-compose up -d --build

The server should now be up and running locally. By default, the mlflow user interface can be accessed at http:/localhost:80 and minio can be accessed at https:/localhost:8002.

Overview

Components overview

Opening a terminal and running docker ps lists the running containers, we should see something like this:

CONTAINER ID   IMAGE                                      COMMAND                  CREATED             STATUS                       PORTS                                        NAMES
3d51a7580b6f   mlflow_nginx                               "nginx -g 'daemon of…"   About an hour ago   Up About an hour             0.0.0.0:80->80/tcp, 0.0.0.0:8002->8002/tcp   mlflow_nginx
1baa8ff12814   mlflow_app                                 "mlflow server --bac…"   About an hour ago   Up About an hour             5000/tcp                                     mlflow_server
a397b4149c5f   minio/minio:RELEASE.2021-03-17T02-33-02Z   "/usr/bin/docker-ent…"   About an hour ago   Up About an hour (healthy)   9000/tcp, 9002/tcp                           mlflow_server_s3_1
65374369fe4d   mysql/mysql-server:5.7.28                  "/entrypoint.sh mysq…"   About an hour ago   Up About an hour (healthy)   3306/tcp, 33060/tcp                          mlflow_db

When we ran docker-compose up we started 4 networked containers, each of which serves a purpose within the MLOps framework.

  1. NGINX: The nginx container acts as a reverse proxy to control network traffic.
  2. MLflow: The MLflow container hosts our MLflow server instance. This server is responsible for tracking and logging the MLOps events sent to it.
  3. MINIO: The MINIO container hosts our MINIO server. Here we are using MINIO as a self hosted S3 storage location. The MLflow container interfaces well with S3 storage locations for logging artifacts (models, images, plots etc)
  4. mysql: The mysql server container is visible only to the MLflow container, which logs MLflow entities to the mysql database hosted on this container. MLFlow entities should not be confused with artifacts (stored on MINIO), and are simple values such as metrics, parameters and configuration options which can be efficiently stored in a database.

There are two bridge networks which connect these containers, named 'frontend' and 'backend'. The backend is used for communication between containers and is not accessible from the host (or remote), the frontend is accesible from the host (or remote) through the NGINX reverse proxy. NGINX will act as our gatekeeper and all requests will pass through it. This enables us to take advantage of NGINX load balancing and authentication in production versions.

Data versioning with DVC

AI projects differ from conventional software projects in that their results depend not only on the code used at runtime, but also on the data used to train the model. We can easily track code versions using git, and we can use DVC in a similar way to track data. In brief, git is not intended to be used for large files, by using DVC we can track the version of data we use for a specific git commit and log the location of this data to a git repository. We only stored the data version and location, the data itself is stored elsewhere.

Experiment tracking with MLflow

MLflow is a framework for managing the full lifecycle of AI models. It contains tools to cover each stage of AI model lifecycle it contains 4 major component Tracking, Projects, Models, and a Model Registry. The endpoint for these tools is an MLflow server that cun run on local or remote hardware and handles all aspects of the lifecycle.

Currently, we will focus primarily on the tracking and projects components.

  • Tracking refers to tools used to track experiments to record and compare parameters and results. This is done by adding logging snippets to the ML code to record things like hyper-parameters, metrics and artifacts. These entities are then associated with a particular run with a specific git commit. This git commit points to a specific version of the project files, which points to specific data version through DVC. This means that by using MLflow tracking we are able to identifiy the code and data versions used to train an AI model and make comparisons following changes to either.

  • MLflow uses projects to encapsulate AI tools in a reusable and reproducible way, based primarily on conventions. It also enables us to chain together project workflows meaning we are able to automate a great deal of the model development process.

Bringing it all together: hyper-parameter tuning

Seeing each of these components independently is useful, but the best way to learn how all these comoponets work together is with an example. Almost all AI models will benefit from the process of hyper parameter tuning, a process which is difficult when attempted without a robust MLOps service. This example demonstrates how using the experiment tracking described above facilitates this process.

For a detailed tutorial describing the end-to-end process of AI development using this framework please see the following blog post.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Contact

Laurence Jackson (GSTT-CSC)

Project Link: https://github.com/GSTT-CSC/MLOps

Acknowledgements