addvin's Stars
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
neuralmagic/sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
neuralmagic/sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
neuralmagic/sparsify
ML model optimization product to accelerate inference.
neuralmagic/nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs