/mlflow-experiment

A template for an experiment orchestrated with MLflow.

Primary LanguageShellApache License 2.0Apache-2.0

MLflow Deployment and Usage with docker-compose

Easily deploy an MLflow tracking server with 1 command.

Architecture

The MLflow tracking server is composed of 4 docker containers:

  • MLflow client (runs experiments)
  • MLflow server / web interface at localhost:5555 (receives data from experiments)
  • MinIO object storage server minio (holds artifacts from experiments)
  • A database to track tabular experimental results, either:
    • PostgreSQL database server postgres, or
    • MySQL database server mysql
  • (and a fifth temporary) MinIO client mc (to create initial s3://mlflow/ bucket upon startup)

Quickstart

  1. Install Docker and ensure you have docker-compose installed. Make sure you have make installed as well (and awk, grep, curl, head, and tail for the serving example.

  2. Clone (download) this repository

    git clone https://github.com/ml-starter-packs/mlflow-experiment.git
  3. cd into the mlflow-experiment directory

  4. Build and run the containers with docker-compose up -d --build:

    make
  5. Access MLflow UI with http://localhost:5555

  6. Watch as runs begin to populate in the demo experiment as the script ./examples/main.py executes. (NOTE: most of the HuggingFace models seem to be unsupported on arm64 architectures, so this demo is best run through a machine with an amd64 processor).

  7. (optional) Access MinIO UI with http://localhost:9000 to see how MLflow artifacts are organized in the S3-compatible object storage (default credentials are minio / minio123).

Cleanup

To stop all containers and remove all volumes (i.e., purge all stored data), run

make clean

To stop all running containers without removing volumes (i.e. you want the state of the application to persist), run

make stop

Training and Serving Example

A complete example that would resemble local usage can be found at ./examples/train-and-serve.sh and run with

make serve

This demo trains a model using mlflow/mlflow-example under the Default experiment) and then serves it as an API endpoint.

Give it a set of samples to predict on using curl with

make post

You can stop serving your model (perhaps if you want to try running the serving demo a second time) with

make stop

Note: you can run ./examples/train-and-serve.sh locally if you prefer (it is designed as a complete example) but you need to change the URLs to point to your local IP address and reflect that mlflow is exposed on port 5555 (the service runs on 5000 within its container but this is a commonly used port so it is changed to avoid potential conflicts with existing services on your machine). Take note that you may want to omit the --no-conda flags if you want to use the default behavior of mlflow serve which leverages Anaconda.

Running New Experiments

Edit ./examples/main.py and re-run the experiment service (if you commit your code, the latest git hash will be reflected in MLflow) using docker-compose run nlp:

make run

When it completes after a few minutes, you will find new results populated in the existing demo experiment, and a stopped container associated with the run will be visible when running docker ps -a.

The container associated with the example runs can be removed with

make rm

Note: This instruction is also run by make clean.

A Note on Docker Setup

This may be of more relevance to some than others, depending on which container-orchestration client you are using. If you get credential errors from trying to pull the images, it is because your program is not sure what domain name to infer (some private registry or docker's default?).

You can make explicit where you want images that are not prepended with a domain name to come from by setting your docker config file:

cat ~/.docker/config.json 
{
  "experimental" : "disabled",
  "credStore" : "desktop",
  "auths" : {
    "https://index.docker.io/v1/" : {

    }
  }
}

Be aware that it may be credStore or credsStore depending on your setup.

A Note on Clearing Your Database (and Serverless PostgreSQL)

When using the docker-compose setup here, make clean will wipe your whole database, which is convenient for testing. However, you may eventually move to a "real" database (perhaps a managed service) and notice that runs you delete in the MLflow UI are NOT removed from your tables.

To remove runs from your tables, the command resembles the one used to launch the mlflow server:

docker exec -ti mlflow_server bash

DB_HOST=<hosted db>
DB_USER=<username>
DB_PASS=<password>
DB_TYPE=<postgresql or mysql+pymysql>
DB_NAME=<name>

mlflow gc --backend-store-uri --backend-store-uri ${DB_TYPE}://${DB_USER}:${DB_PASS}@${DB_HOST}/${DB_NAME}

For neon.tech, note that you need to pass extra arguments to your DB_NAME (note project-id is not the same thing as the "project name":

DB_NAME=<db-name>?sslmode=require&options=project%3D<project-id>

(alternatively, leave options off if your project-id is used as your subdomain when specifying DB_HOST)