Pinned Repositories
ao
Custom data types and layouts for training and inference
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
inference
Reference implementations of MLPerf™ inference benchmarks
inference_results_v1.1
intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension that brings Intel GPU (XPU) support to DeepSpeed.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
training
Reference implementations of MLPerf™ training benchmarks
pai
Resource scheduling and cluster management for AI
dbyoung18's Repositories
dbyoung18/ao
Custom data types and layouts for training and inference
dbyoung18/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
dbyoung18/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
dbyoung18/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
dbyoung18/inference
Reference implementations of MLPerf™ inference benchmarks
dbyoung18/inference_results_v1.1
dbyoung18/intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension that brings Intel GPU (XPU) support to DeepSpeed.
dbyoung18/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
dbyoung18/training
Reference implementations of MLPerf™ training benchmarks
dbyoung18/KE-complex_modifications
Karabiner-Elements complex_modifications rules
dbyoung18/mlx
MLX: An array framework for Apple silicon
dbyoung18/safetensors
Simple, safe way to store and distribute tensors
dbyoung18/TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
dbyoung18/training_results_v2.1
dbyoung18/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
dbyoung18/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
dbyoung18/warp-transducer
A fast parallel implementation of RNN Transducer.
dbyoung18/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.