Neural Magic

Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.

Boston

Pinned Repositories

AutoFP8
Language:Python181 15 2824
compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
Language:Python91 12 1710
deepsparse
Sparsity-aware deep learning inference runtime for CPUs
Language:Python3.1k 55 143181
docs
Top-level directory for documentation and general content
Language:MDX120 22 37
guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Language:Python232 18 3623
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python262 8 2011
nm-vllm-certs
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
11 5 11
sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Language:Python2.1k 47 210151
sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Language:Python382 23 2326
sparsify
ML model optimization product to accelerate inference.
Language:Python326 26 2430

Neural Magic doesn’t have any repository yet.