zifeitong

zifeitong's Stars

google/llvm-propeller
PROPELLER: Profile Guided Optimizing Large Scale LLVM-based Relinker
Language:Shell35834
felafax/felafax
Felafax is building AI infra for non-NVIDIA GPUs
Language:Jupyter Notebook49425
brentyi/jaxls
Sparse nonlinear least squares in JAX
Language:Python17511
zml/zml
High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild
Language:Zig1.6k56
LaurentMazare/xla-rs
Experimentation using the xla compiler from rust
Language:Rust8813
AlibabaPAI/llumnix
Efficient and easy multi-instance LLM serving
Language:Python16012
google-deepmind/penzai
A JAX research toolkit for building, editing, and visualizing neural networks.
Language:Python1.7k50
NVIDIA/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Language:Cuda547110
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Language:Python37230
project-oak/oak
Meaningful control of data in distributed systems.
Language:Rust1.3k113
google/riegeli
Riegeli/records is a file format for storing a sequence of string records, typically serialized protocol buffers.
Language:C++41653
EleutherAI/cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Language:Python69235
HPMLL/BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Language:Python1177
google-research/dex-lang
Research language for array processing in the Haskell/ML family
Language:Haskell1.6k107
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
Language:C++20215
pytorch/torchtune
PyTorch native finetuning library
Language:Python4.2k399
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Language:Python58849
scylladb/seastar
High performance server-side application framework
Language:C++8.3k1.5k
facebookexperimental/libunifex
Unified Executors
Language:C++1.5k188
unum-cloud/ucall
Web Serving and Remote Procedure Calls at 50x lower latency and 70x higher bandwidth than FastAPI, implementing JSON-RPC & REST over io_uring ☎️
Language:C1.1k41
NVIDIA/stdexec
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
Language:C++1.6k158
bytedance/monoio
Rust async runtime based on io-uring.
Language:Rust3.9k223
google/aqt
Language:Python25826
google/airio
Language:Python147
google-deepmind/dm_pix
PIX is an image processing library in JAX, for JAX.
Language:Python38622
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python1.9k314
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.6k509
pytorch/serve
Serve, optimize and scale PyTorch models in production
Language:Java4.2k855
skyplane-project/skyplane
🔥 Blazing fast bulk data transfers between any cloud 🔥
Language:Python1.1k62
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
Language:Python34440