Neural Magic
Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.
Boston
Pinned Repositories
AutoFP8
compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
deepsparse
Sparsity-aware deep learning inference runtime for CPUs
docs
Top-level directory for documentation and general content
guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
nm-vllm-certs
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
sparsify
ML model optimization product to accelerate inference.
Neural Magic's Repositories
Neural Magic doesn’t have any repository yet.