fastertransformer
There are 5 repositories under fastertransformer topic.
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Curt-Park/serving-codegen-gptj-triton
Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes
detail-novelist/novelist-triton-server
Deploy KoGPT with Triton Inference Server
clam004/triton-ft-api
tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server
RajeshThallam/fastertransformer-converter
This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.