SqueezeBits

We are squeezing bits.

Korea, South

Pinned Repositories

.github
0 0 00
mlperf_inference_results_v4.0
Language:C++01
owlite
OwLite is a low-code AI model compression toolkit for AI models.
Language:Python36 4 03
owlite-examples
OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.
Language:Python8 1 00
QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
Language:Python106 1 66
vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0 00
vllm-quick
Language:Python1 0 00

SqueezeBits's Repositories

SqueezeBits/QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
Language:Python106 1 66
SqueezeBits/owlite
OwLite is a low-code AI model compression toolkit for AI models.
Language:Python36 4 03
SqueezeBits/owlite-examples
OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.
Language:Python8 1 00
SqueezeBits/vllm-quick
Language:Python1 0 00
SqueezeBits/.github
0 0 00
SqueezeBits/mlperf_inference_results_v4.0
Language:C++01
SqueezeBits/vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0 00