/MLServe

Scaling Machine Learning Workloads on GPU Clusters

MLServe

Scaling Machine Learning Workloads on GPU Clusters

  • Configured an NVIDIA GPU cluster with CUDA, cuDNN, and Jaxlib.
  • Used Alpa to leverage model parallelism and statistical multiplexing to scale inference workloads across GPUs using Ray framework.