isaac-vidas

Pinned Repositories

llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python0 0 00
ray
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Language:Python00
sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable.
Language:Python0 0 00
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++0 0 00

isaac-vidas's Repositories

isaac-vidas/llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python0 0 00
isaac-vidas/ray
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Language:Python00
isaac-vidas/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable.
Language:Python0 0 00
isaac-vidas/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++0 0 00
isaac-vidas/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python0 0 00