fastertransformer

There are 5 repositories under fastertransformer topic.

InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python7.1k 54 2k607
Curt-Park/serving-codegen-gptj-triton
Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes
Language:Python20 2 00
detail-novelist/novelist-triton-server
Deploy KoGPT with Triton Inference Server
Language:Shell14 1 00
clam004/triton-ft-api
tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server
Language:Python5 1 00
RajeshThallam/fastertransformer-converter
This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.
Language:Python3 0