zifeitong's Stars
google/llvm-propeller
PROPELLER: Profile Guided Optimizing Large Scale LLVM-based Relinker
felafax/felafax
Felafax is building AI infra for non-NVIDIA GPUs
brentyi/jaxls
Sparse nonlinear least squares in JAX
zml/zml
High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild
LaurentMazare/xla-rs
Experimentation using the xla compiler from rust
AlibabaPAI/llumnix
Efficient and easy multi-instance LLM serving
google-deepmind/penzai
A JAX research toolkit for building, editing, and visualizing neural networks.
NVIDIA/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
project-oak/oak
Meaningful control of data in distributed systems.
google/riegeli
Riegeli/records is a file format for storing a sequence of string records, typically serialized protocol buffers.
EleutherAI/cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
HPMLL/BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
google-research/dex-lang
Research language for array processing in the Haskell/ML family
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
pytorch/torchtune
PyTorch native finetuning library
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
scylladb/seastar
High performance server-side application framework
facebookexperimental/libunifex
Unified Executors
unum-cloud/ucall
Web Serving and Remote Procedure Calls at 50x lower latency and 70x higher bandwidth than FastAPI, implementing JSON-RPC & REST over io_uring ☎️
NVIDIA/stdexec
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
bytedance/monoio
Rust async runtime based on io-uring.
google/aqt
google/airio
google-deepmind/dm_pix
PIX is an image processing library in JAX, for JAX.
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
pytorch/serve
Serve, optimize and scale PyTorch models in production
skyplane-project/skyplane
🔥 Blazing fast bulk data transfers between any cloud 🔥
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference