Pinned Repositories
inference
Reference implementations of MLPerf™ inference benchmarks
jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
JetStream
A throughput and memory optimized engine for LLM inference on TPU and GPU!
jetstream-pytorch
llama
Inference code for Llama models
maxtext
A simple, performant and scalable Jax LLM!
torch_xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
triton-flash-attention
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
FanhaiLu1's Repositories
FanhaiLu1/inference
Reference implementations of MLPerf™ inference benchmarks
FanhaiLu1/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
FanhaiLu1/JetStream
A throughput and memory optimized engine for LLM inference on TPU and GPU!
FanhaiLu1/jetstream-pytorch
FanhaiLu1/llama
Inference code for Llama models
FanhaiLu1/maxtext
A simple, performant and scalable Jax LLM!
FanhaiLu1/torch_xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
FanhaiLu1/triton-flash-attention
FanhaiLu1/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs