Neural Magic
Neural Magic helps developers in accelerating machine learning performance using automated model sparsification techniques and inference technologies.
Boston
Pinned Repositories
AutoFP8
deepsparse
Sparsity-aware deep learning inference runtime for CPUs
docs
Top-level directory for documentation and general content
examples
Notebooks using the Neural Magic libraries đź““
guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
nm-vllm-certs
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
sparsify
ML model optimization product to accelerate inference.
Neural Magic's Repositories
neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
neuralmagic/sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
neuralmagic/sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
neuralmagic/nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
neuralmagic/AutoFP8
neuralmagic/guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
neuralmagic/examples
Notebooks using the Neural Magic libraries đź““
neuralmagic/compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
neuralmagic/yolov5
YOLOv5 in PyTorch > ONNX > CoreML > TFLite
neuralmagic/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
neuralmagic/helm-charts
Helm charts for deploying NM VLLM
neuralmagic/nm-vllm-certs
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
neuralmagic/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
neuralmagic/inference
Reference implementations of MLPerf™ inference benchmarks
neuralmagic/lm-evaluation-harness
A framework for few-shot evaluation of language models.
neuralmagic/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
neuralmagic/causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
neuralmagic/cutlass
CUDA Templates for Linear Algebra Subroutines
neuralmagic/evalplus
NeuralMagic fork of EvalPlus (Rigourous evaluation of LLM-synthesized code - NeurIPS 2023)
neuralmagic/flash-attention
Fast and memory-efficient exact attention
neuralmagic/llm-foundry
NM fork of LLM foundry for compatibility with SparseAutoModel.
neuralmagic/mamba
Mamba SSM architecture
neuralmagic/MixEval
NM fork of MixEval compatible with SparseAutoModel.
neuralmagic/nm-actions
Neural Magic GHA
neuralmagic/nm-vllm-utils
Various utilities for use with nm-vllm
neuralmagic/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
neuralmagic/triton_vllm_backend
Triton vLLM Backend
neuralmagic/upstream-composer
Supercharge Your Model Training
neuralmagic/upstream-llm-foundry
LLM training code for MosaicML foundation models
neuralmagic/upstream-transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.