Neural Magic
Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.
Boston
Pinned Repositories
AutoFP8
compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
deepsparse
Sparsity-aware deep learning inference runtime for CPUs
docs
Top-level directory for documentation and general content
guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
nm-vllm-certs
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
sparsify
ML model optimization product to accelerate inference.
Neural Magic's Repositories
neuralmagic/nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
neuralmagic/guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
neuralmagic/AutoFP8
neuralmagic/docs
Top-level directory for documentation and general content
neuralmagic/compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
neuralmagic/yolov5
YOLOv5 in PyTorch > ONNX > CoreML > TFLite
neuralmagic/nm-vllm-certs
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
neuralmagic/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
neuralmagic/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
neuralmagic/quant_kernel_benchmarks
Benchmarking code for running quantized kernels from vLLM and other libraries
neuralmagic/lm-evaluation-harness
A framework for few-shot evaluation of language models.
neuralmagic/upstream-transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
neuralmagic/vllm-flash-attention
Fast and memory-efficient exact attention
neuralmagic/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
neuralmagic/axolotl
Go ahead and axolotl questions
neuralmagic/causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
neuralmagic/evalplus
NeuralMagic fork of EvalPlus (Rigourous evaluation of LLM-synthesized code - NeurIPS 2023)
neuralmagic/flash-attention
Fast and memory-efficient exact attention
neuralmagic/graphs
neuralmagic/mamba
Mamba SSM architecture
neuralmagic/mistral-evals
neuralmagic/MixEval
NM fork of MixEval compatible with SparseAutoModel.
neuralmagic/mteb
MTEB: Massive Text Embedding Benchmark
neuralmagic/nm-actions
Neural Magic GHA
neuralmagic/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
neuralmagic/pytest-nm-releng
Pytest plugin used by the Release Engineering team
neuralmagic/research
Repository to enable research flows
neuralmagic/temp-AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
neuralmagic/upstream-composer
Supercharge Your Model Training
neuralmagic/upstream-llm-foundry
LLM training code for MosaicML foundation models