Triton TensorRT LLM Model Preparation and Deployment

Overview

This repository provides scripts and instructions for preparing and deploying Large Language Models (LLMs) using the Triton Inference Server with TensorRT-LLM. With a streamlined workflow, you can efficiently deploy LLM models in production environments.

Getting Started

To get started, follow these steps:

Cd to main directory:
```
cd DEPLOY_T5
```

Build a Docker image from the DockerFile:

docker build -t triton-llm-deployment .

Create a container and run the provided bash files inside the container:
```
docker run -it --name triton-llm-container triton-llm-deployment
```

Within the container, execute the bash files to prepare and deploy your LLM model:

bash scripts/1.install_git_and_lfs.sh
bash scripts/2.install_tensorrt_llm.sh
bash scripts/3.trendyol_llm_tensorrt_engine_build_and_test.sh
bash scripts/4.create_triton_model_repository.sh
bash scripts/5.run triton.sh
bash scripts/6.call_triton_model_curl_example.sh

Repository Structure

1.install_git_and_lfs.sh: Installs Git and Git LFS if not already installed.
2.install_tensorrt_llm.sh: Fetches the tensorrtllm_backend project and its submodules, ensuring compatibility with Triton.
3.trendyol_llm_tensorrt_engine_build_and_test.sh: Downloads and converts the Trendyol/Trendyol-LLM-7b-chat-v1.0 model from Hugging Face into a TensorRT-LLM checkpoint, and then creates a TensorRT-LLM engine from this checkpoint for testing.
4.create_triton_model_repository.sh: Creates the Triton model repository and copies model definition files into it.
5.run_triton.sh: Starts the Triton server to serve the deployed LLM model.
6.call_triton_model_curl_example.sh: Provides an example of how to make requests to the deployed LLM model using cURL.

bachvudinh/DEPLOY-Triton-Backend

Triton TensorRT LLM Model Preparation and Deployment

Overview

Getting Started

Repository Structure

Resources