/Kubernetes-for-Machine-Learning

MLOps with Kubernetes: Harnessing GPU Power. Hands-on concepts including NVIDIA GPUs, Microk8s, and Kubeflow.

Primary LanguagePythonMIT LicenseMIT

Setting up Machine Learning Infrastructure on Kubernetes

This tutorial is tailored for Kubernetes and Devops engineers looking to deepen their understanding of machine learning operations with a focus on Kubernetes and containers.

To put this in perspective, there are 3 different personas.

  • Application developers using APIs (eg. Openai api) to build applications. This is not relevant to you.
  • Data Scientists building and finetuning models as a product. This is not relevant to you.
  • Engineers looking to build/operate/learn about ML infrastructure and processes. This is for you.

We'll use MicroK8s on Ubuntu 22.04, running on Paperspace cloud by DigitalOcean, giving you real-world experience in managing ML workflows in Kubernetes.

Why MicroK8s?

  • Thin footprint and low operational overhead to run on a single VM. Hence useful for learning.
  • Highly available and production-ready

Table of Contents

1. Basics of GPU

Learn the fundamentals of GPU technology, with a focus on NVIDIA GPUs. We'll cover hardware checks, driver installation, and dive into GPU compute, memory, and scheduling, including hands-on command-line examples.

2. MLOps

Compare between DevOps and MLOps for Machine learning. While MLOps tooling can be very broad, start with developer view and add the tool only when there is a need.

3. GPU + Container

Set up container environments and deploying GPU-based applications.

4. GPU + Kubernetes

Explore how GPUs integrate with Kubernetes, using Microk8s. We self-host Microk8s/Ubuntu on Paperspace.

5. Multi-instance GPU

Optimize Kubernetes for complex scenarios like multi-instance GPUs, enhancing resource utilization and ML workload performance.

6. Ansible Setup

Set up a GPU-ready, multi-node Microk8s cluster with shared storage on Paperspace using Ansible with opinionated settings.