Neural Magic

Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.

Boston

Pinned Repositories

AutoFP8
Language:Python181 15 2824
compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
Language:Python91 12 1710
deepsparse
Sparsity-aware deep learning inference runtime for CPUs
Language:Python3.1k 55 143181
docs
Top-level directory for documentation and general content
Language:MDX120 22 37
guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Language:Python226 18 3623
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python262 8 2011
nm-vllm-certs
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
11 5 11
sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Language:Python2.1k 47 210151
sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Language:Python382 23 2326
sparsify
ML model optimization product to accelerate inference.
Language:Python326 26 2430

Neural Magic's Repositories

neuralmagic/nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python262 8 2011
neuralmagic/guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Language:Python226 18 3623
neuralmagic/AutoFP8
Language:Python181 15 2824
neuralmagic/docs
Top-level directory for documentation and general content
Language:MDX120 22 37
neuralmagic/compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
Language:Python91 12 1710
neuralmagic/yolov5
YOLOv5 in PyTorch > ONNX > CoreML > TFLite
Language:Python20 5 04
neuralmagic/nm-vllm-certs
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
11 5 11
neuralmagic/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python10 0 03
neuralmagic/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
Language:Python9 3 03
neuralmagic/quant_kernel_benchmarks
Benchmarking code for running quantized kernels from vLLM and other libraries
Language:Python5 5 1
neuralmagic/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python3 0 01
neuralmagic/upstream-transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python1 1 0
neuralmagic/vllm-flash-attention
Fast and memory-efficient exact attention
Language:C++1 0 0
neuralmagic/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Language:Jupyter Notebook1 0
neuralmagic/axolotl
Go ahead and axolotl questions
Language:Python0 0
neuralmagic/causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
neuralmagic/evalplus
NeuralMagic fork of EvalPlus (Rigourous evaluation of LLM-synthesized code - NeurIPS 2023)
Language:Python0 0
neuralmagic/flash-attention
Fast and memory-efficient exact attention
Language:C++0 0
neuralmagic/graphs
neuralmagic/mamba
Mamba SSM architecture
Language:Python0 0
neuralmagic/mistral-evals
Language:Python0 0
neuralmagic/MixEval
NM fork of MixEval compatible with SparseAutoModel.
Language:Python0 0
neuralmagic/mteb
MTEB: Massive Text Embedding Benchmark
Language:Jupyter Notebook0 01
neuralmagic/nm-actions
Neural Magic GHA
Language:Python14 0
neuralmagic/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language:Python0 0
neuralmagic/pytest-nm-releng
Pytest plugin used by the Release Engineering team
Language:Python
neuralmagic/research
Repository to enable research flows
Language:Python7 0
neuralmagic/temp-AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python0 0
neuralmagic/upstream-composer
Supercharge Your Model Training
Language:Python1 0
neuralmagic/upstream-llm-foundry
LLM training code for MosaicML foundation models
Language:Python1 0