robertgshaw2-neuralmagic

@vllm-project | @neuralmagic

@neuralmagicBoston

Pinned Repositories

deepsparse-continuous-batching
DeepSparse Continuous Batching
Language:Python1 0 00
gpu-profiling
GPU Profiling
Language:Jupyter Notebook1 1 01
langchain-gpt
Code-generation for Langchain framework
Language:Jupyter Notebook2 3 00
llm-compressor-example
Example using llm-compressor
Language:Python1 1 00
marlin-example
Example of quantizing and saving a model with Marlin
Language:Jupyter Notebook1 1 00
mistral-self-rag
Training mistral on self-rag task
Language:Python1 1 00
vllm-benchmarking
Benchmarking vLLM
Language:Jupyter Notebook1 2 00
vllm-k8s
Example deploying vLLM on GKE
Language:Jupyter Notebook2 1 00
llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Language:Python598 12 7249
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python28.9k 243 4.9k4.3k

robertgshaw2-neuralmagic's Repositories

robertgshaw2-neuralmagic/vllm-k8s
Example deploying vLLM on GKE
Language:Jupyter Notebook2 1 00
robertgshaw2-neuralmagic/deepsparse-continuous-batching
DeepSparse Continuous Batching
Language:Python1 0 00
robertgshaw2-neuralmagic/llm-compressor-example
Example using llm-compressor
Language:Python1 1 00
robertgshaw2-neuralmagic/marlin-example
Example of quantizing and saving a model with Marlin
Language:Jupyter Notebook1 1 00
robertgshaw2-neuralmagic/mistral-self-rag
Training mistral on self-rag task
Language:Python1 1 00
robertgshaw2-neuralmagic/vllm-benchmarking
Benchmarking vLLM
Language:Jupyter Notebook1 2 00
robertgshaw2-neuralmagic/auto-fp8
Making FP8 Checkpoints
Language:Python0 1 00
robertgshaw2-neuralmagic/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python0 0 00
robertgshaw2-neuralmagic/bert-benchmarking
Repo for benchmarking bert performance under various scenarios
Language:Jupyter Notebook0 1 00
robertgshaw2-neuralmagic/bert-server-example
DeepSparse Server Running BERT
Language:Python0 2 00
robertgshaw2-neuralmagic/zephyr-training
Recreating and playing with zephyr
Language:Jupyter Notebook0 1 00
robertgshaw2-neuralmagic/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
robertgshaw2-neuralmagic/buildkite-ci
robertgshaw2-neuralmagic/chat-example
Example calling chat api
Language:Jupyter Notebook1 0
robertgshaw2-neuralmagic/deepsparse-llm-server-example
example for deepsparse llm in basic server
Language:Jupyter Notebook1 0
robertgshaw2-neuralmagic/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Language:Python0 0
robertgshaw2-neuralmagic/gptq-benchmarking
Benchmarking gptq performance and how the kernels work
1 0
robertgshaw2-neuralmagic/gptq-experiments
Experiments running GPTQ
1 0
robertgshaw2-neuralmagic/gptq-serialization-example
Example of gptq serialization
Language:Jupyter Notebook1 0
robertgshaw2-neuralmagic/lm-evaluation-harness
A framework for few-shot evaluation of language models.
robertgshaw2-neuralmagic/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python0 0
robertgshaw2-neuralmagic/nm-vllm-example
Example running nm-vllm
Language:Python
robertgshaw2-neuralmagic/one-shot-mpt-gsm-8k
Experiments for applying one shot
Language:Jupyter Notebook1 0
robertgshaw2-neuralmagic/sparse-finetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
Language:Python0 0
robertgshaw2-neuralmagic/tgi-benchmarking
Benchmarking LLMs on GPUs
Language:Jupyter Notebook1 01
robertgshaw2-neuralmagic/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python0 0
robertgshaw2-neuralmagic/viggo-finetuning
Example finetuning an LLM on viggo dataset
Language:Jupyter Notebook1 0
robertgshaw2-neuralmagic/vllm-client
Client for benchmarking vllm
Language:Python1 0
robertgshaw2-neuralmagic/vllm-examples
Example benchmarking vLLM
Language:Python2 0
robertgshaw2-neuralmagic/vllm-qa-basic-correctness
Repo for basic correctness of vllm
Language:Python1 0