/paella

Paella: Low-latency Model Serving with Virtualized GPU Scheduling

Primary LanguageC++MIT LicenseMIT

Paella / LLIS

This project was called LLIS at the very beginning, and so this name is used in the codebase.

SOSP 2023 Artifact Evaluation

Please refer to the instructions in the sosp23_artifact/ directory.

Dependencies

  1. Linux (tested on Ubuntu 22.04)
  2. NVIDIA driver (tested on 535.54.03)
  3. CUDA runtime (tested on 12.2.0)
  4. GCC (tested on 11.3.0)
  5. CMake (tested on 3.22.1)
  6. Boost (tested on 1.82.0)
  7. LLVM / Clang (tested on 14)
  8. spdlog (tested on 1.11.0; 1.12.0 is known to not work)
  9. tvm-llis (Custom version of TVM modified to work with Paella)

Installation

Paella/LLIS server and libraries

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=<release|debug> -DCMAKE_CUDA_ARCHITECTURES=<cuda_arch> .. # cuda_arch is 60 for 6.0, 75 for 7.5, etc
make -j$(nproc) install

Custom TVM (tvm-llis)

Custom TVM depends on the libraries of Paella/LLIS. So, it can only be built after doing the previous step.

Please refer to README-llis.md of tvm-llis for instructions.

Paella/LLIS applications (e.g., client) and job adapters

Applications and job adapters depend on the custom TVM. So, they can only be built after doing the previous step.

cmake .. -Utvm_FOUND # Find TVM again after we have installed it
make -j$(nproc) install