Neural Magic
Neural Magic (Acquired by Red Hat) empowers developers to optimize & deploy LLMs at scale. Our model compression & acceleration enable top performance with vLLM
Boston
Pinned Repositories
AutoFP8
deepsparse
Sparsity-aware deep learning inference runtime for CPUs
docs
Top-level directory for documentation and general content
examples
Notebooks using the Neural Magic libraries đź““
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
sparsify
ML model optimization product to accelerate inference.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
yolov5
YOLOv5 in PyTorch > ONNX > CoreML > TFLite
Neural Magic's Repositories
neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
neuralmagic/sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
neuralmagic/sparsify
ML model optimization product to accelerate inference.
neuralmagic/yolov5
YOLOv5 in PyTorch > ONNX > CoreML > TFLite
neuralmagic/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
neuralmagic/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
neuralmagic/vllm-flash-attention
Fast and memory-efficient exact attention
neuralmagic/lm-evaluation-harness
A framework for few-shot evaluation of language models.
neuralmagic/research
Repository to enable research flows
neuralmagic/yolov3
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
neuralmagic/model-validation-configs
neuralmagic/LMCache
Redis for LLMs
neuralmagic/upstream-transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
neuralmagic/arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
neuralmagic/axolotl
Go ahead and axolotl questions
neuralmagic/collective_op_benchmarks
neuralmagic/DeepEP
DeepEP: an efficient expert-parallel communication library
neuralmagic/DeepEP-test
DeepEP: an efficient expert-parallel communication library
neuralmagic/DeepGEMM
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
neuralmagic/flashinfer
FlashInfer: Kernel Library for LLM Serving
neuralmagic/github-jira-sandbox
proving grounds for GitHub to JIRA ... yay!
neuralmagic/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
neuralmagic/llm-d
llm-d is a Kubernetes-native high-performance distributed LLM inference framework
neuralmagic/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
neuralmagic/nm-actions
Neural Magic GHA
neuralmagic/pplx-kernels
Perplexity GPU Kernels
neuralmagic/pydantic-regmix
Common mixins, registries, and utilities with native support for Pydantic used across popular repos such as GuideLLM and Speculators
neuralmagic/pytest-nm-releng
Pytest plugin used by the Release Engineering team
neuralmagic/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
neuralmagic/sglang
SGLang is a fast serving framework for large language models and vision language models.