Setup Mlflow tracking server using MLflow and minio

doc site https://mlflow-minio-config-doc.streamlit.app/

Step 1: Setup minio server

  • Check minio official site for the available packages

  • Download the appropriate version, for my case, I used an ubuntu ARM machine

    • I downloaded the file in my software directory with super user privilage
      cd ~/software
      wget https://dl.min.io/server/minio/release/linux-arm64/minio_20230930070229.0.0_arm64.deb
    • Then installed it in my machine
      dpkg -i minio_20230930070229.0.0_arm64.deb
  • Test the minio storage

    • Check if its running properly or not, exposing the /mnt/data directory
      MINIO_ROOT_USER=admin MINIO_ROOT_PASSWORD=password minio server /mnt/data --console-address ":9001"
    • Configure nginx server for my site minio.mlhub.in port 80
      server {
          server_name minio.mlhub.in;
          location / {
              proxy_pass http://localhost:9001/;
              proxy_set_header Host $host;
              proxy_set_header Upgrade $http_upgrade;
              proxy_set_header Connection upgrade;
              proxy_set_header Accept-Encoding gzip;
              # pass max_body_size 1024M; 0 for unlimited
              client_max_body_size 0;
          listen 80;
          listen [::]:80;
    • Run the minio and check the browser
      MINIO_ROOT_USER=admin MINIO_ROOT_PASSWORD=password minio server /mnt/data --console-address ":9001"
  • Once installed, we can start the minio server using systemctl/service command

    • Upon checking status before starting the minio server
      sudo systemctl status minio
      # OR
      sudo service minio status
  • Create minio user and group

    • Create group minio-user
      sudo groupadd -r minio-user
      sudo useradd -M -r -g minio-user minio-user
    • Create the directory if not available and give minio-user ownership
      sudo mkdir /mnt/data
      sudo chown minio-user:minio-user /min/data
    • Setup Minio environment and provide ssl certificate directory, username and password
      • We are yet to set the certs directory, will do it later
      MINIO_OPTS="--certs-dir /home/ubuntu/.minio/certs --console-address :9001"
  • Setup firewall (optional, Required for specific vm)

    • Install firewall and give public access to port 9000 and 9001
      sudo apt install firewalld
      sudo firewall-cmd --zone=public --permanent --add-port=80/tcp
      sudo firewall-cmd --zone=public --permanent --add-port=9000/tcp
      sudo firewall-cmd --zone=public --permanent --add-port=9001/tcp
      sudo firewall-cmd --reload
  • Install certgen from official minio github repository

  • Generate certificate in /home/ubuntu/.minio/cert directory

    • I have many subdomains so I will generate an wildcard certificate, applicable to all my sub-domains If you want to access using IP address, use ip instead
      sudo certgen -host *.mlhub.in
    • It will generate 2 files private.key and public.crt, and minio also look for exactly these files
      • Will provide ownership of this two files to minio-user
      sudo chown minio-user:minio-user /home/ubuntu/.minio/certs/private.key
      sudo chown minio-user:minio-user /home/ubuntu/.minio/certs/public.crt
  • Start the minio server

    • Using the systemctl/service command we need to enable and start the minio server
      sudo systemctl enable minio # OR sudo service minio enable 
      sudo systemctl start minio # OR sudo service minio start
      sudo systemctl status minio # OR sudo service minio status
  • Expose the site using nginx

    • Looking at the server, we can see 2 ports, One for console exposed to port 9001 and one S3-API exposed to port 9000
    • Need to update the nginx config
      • Added one file minio-server in sites-available i.e. /etc/nginx/sites-available/minio-server
      server {
          server_name minio.mlhub.in;
          location / {
              proxy_set_header Host $host;
              proxy_set_header Upgrade $http_upgrade;
              proxy_set_header Connection upgrade;
              proxy_set_header Accept-Encoding gzip;
              client_max_body_size 0;
          listen 80;
          listen [::]:80;
      server {
          server_name s3.mlhub.in;
          location / {
              proxy_set_header Host $host;
              proxy_set_header Upgrade $http_upgrade;
              proxy_set_header Connection upgrade;
              proxy_set_header Accept-Encoding gzip;
              client_max_body_size 0;
          listen 80;
          listen [::]:80;
      • Above first server configuration is for minio console and second is for the S3-API
    • Create a symbolic link of our configuration file in sites-enabled directory
      sudo ln -s /etc/nginx/sites-available/minio-server /etc/nginx/sites-enabled/
    • Restart the nginx server using systemctl/service
      sudo service nginx restart
      # OR
      sudo systemctl restart nginx
  • Get the ssl certificate using certbot/letsencrypt

    • Give it a try, running certbot
    • It will ask you for the subdomains in a list
    • Choose one and repeat for others
  • Test if the S3-API is working or not

    • Lets open our minio site (in my case https://minio.mlhub.in)
    Generate new Access key (id and secret)
    Create bicket and upload some files
    Open notebook and check for the file through boto3

Step 2: Setup mlflow server

  • Create a virtual environment, in my case I created a virtual environment with name mlflow in my directory /home/ubuntu/.virtualenvs/

  • Install essential packages in the virtual environment

    pip install mlflow boto3
  • Run mlflow locally with minio credentials

    export MLFLOW_S3_ENDPOINT_URL=https://s3.mlhub.in
    export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxxxxxxxxxx
    export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    # Then run 
    mlflow server --backend-store-uri sqlite:////home/ubuntu/ml-tracking-server/mlhub.mlflow.sqlite \
                  --default-artifact-root s3://private/ml-tracking-server/mlhub-artifacts -p 5000
  • Add basic authentication for your mlflow site

    • Install apache2-utils
    sudo apt install apache2-utils
    • Add user and store it in file /etc/nginx/.htpasswd, suppose my username is mu email id
    sudo htpasswd -c /etc/nginx/.htpasswd amanthe001@gmail.com
    • It will prompt for New password and Re-type new password, after filling both, new user will be added Add new user Basic auth
  • Expose the port using nginx

    • Create a new config file, I created /etc/nginx/sites-available/mlflow-server
    • in my mlflow-server config file I have added auth_basic_user_file /etc/nginx/.htpasswd; to enable Basic auth
    server {
    server_name mlflow.mlhub.in;
    location / {
        proxy_pass http://localhost:5000/;
        auth_basic "Administrator’s Area";
        auth_basic_user_file /etc/nginx/.htpasswd;
        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection upgrade;
        proxy_set_header Accept-Encoding gzip;
    listen 80;
    listen [::]:80;
  • Check for the website and perform some basic mlflow tracking operation e.g. upload dataset, text file

    import os
    os.environ["MLFLOW_S3_ENDPOINT_URL"] = "https://s3.mlhub.in"
    os.environ['MLFLOW_TRACKING_USERNAME'] = 'amanthe001@gmail.com'
    os.environ['MLFLOW_TRACKING_PASSWORD'] = 'xxxxxxxxxxx'
    import mlflow
    import pandas as pd
    df = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv",sep=";")
    with mlflow.start_run(run_id = run_id,experiment_id=0) as run:
        run_id = run.info.run_id
        mlflow.log_text("Checking artifacts",artifact_file="sample.txt")
    Need to pass all env variables before importing the package
    Verify run in mlflow UI
    Verify the presence of logged metrics, artifact iin mlflow run
  • Write a custom service file for mlflow

    • Create a file /lib/systemd/system/mlflow.service
    • Write the required environment variable and run command
Description=MLFlow Server


ExecStart=/bin/bash -c 'PATH=/home/ubuntu/.virtualenvs/mlflow/bin/:$PATH exec mlflow server --backend-store-uri sqlite:////home/ubuntu/ml-tracking-server/mlhub.mlflow.sqlite --default-artifact-root s3://private/ml-tracking-server/mlhub-artifacts -p 5000'

