Pinned Repositories
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
chatbot-ui
An open source ChatGPT UI.
discussions
FasterTransformer
Transformer related optimization, including BERT, GPT
flash-attention
Fast and memory-efficient exact attention
flashinfer
FlashInfer: Kernel Library for LLM Serving
LLMBench
A library for validating and benchmarking LLMs inference.
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
vcpkg
C++ Library Manager for Windows, Linux, and MacOS
Vectorch's Repositories
vectorch-ai/ScaleLLM
A high-performance inference system for large language models, designed for production environments.
vectorch-ai/LLMBench
A library for validating and benchmarking LLMs inference.
vectorch-ai/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
vectorch-ai/chatbot-ui
An open source ChatGPT UI.
vectorch-ai/discussions
vectorch-ai/FasterTransformer
Transformer related optimization, including BERT, GPT
vectorch-ai/flash-attention
Fast and memory-efficient exact attention
vectorch-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
vectorch-ai/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
vectorch-ai/vcpkg
C++ Library Manager for Windows, Linux, and MacOS
vectorch-ai/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
vectorch-ai/whl
repository to host python whl package.