Pinned Repositories
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeedL
FasterTransformer
Transformer related optimization, including BERT, GPT
TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
jxcomeon's Repositories
jxcomeon/DeepSpeedL
jxcomeon/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.