Pinned Repositories
.github
mlperf_inference_results_v4.0
owlite
OwLite is a low-code AI model compression toolkit for AI models.
owlite-examples
OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.
QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-quick
SqueezeBits's Repositories
SqueezeBits/QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
SqueezeBits/owlite
OwLite is a low-code AI model compression toolkit for AI models.
SqueezeBits/owlite-examples
OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.
SqueezeBits/vllm-quick
SqueezeBits/.github
SqueezeBits/mlperf_inference_results_v4.0
SqueezeBits/vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs