FanhaiLu1

GoogleSVL

Pinned Repositories

inference
Reference implementations of MLPerf™ inference benchmarks
Language:Python0 0 00
jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Language:Python0 0 00
JetStream
A throughput and memory optimized engine for LLM inference on TPU and GPU!
Language:Python0 0 00
jetstream-pytorch
Language:Python0 0 00
llama
Inference code for Llama models
Language:Python0 0 00
maxtext
A simple, performant and scalable Jax LLM!
Language:Python0 0 00
torch_xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
Language:C++0 0 00
triton-flash-attention
Language:Python00
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0 00
ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Language:Python35k 481 19.3k5.9k

FanhaiLu1's Repositories

FanhaiLu1/inference
Reference implementations of MLPerf™ inference benchmarks
Language:Python0 0 00
FanhaiLu1/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Language:Python0 0 00
FanhaiLu1/JetStream
A throughput and memory optimized engine for LLM inference on TPU and GPU!
Language:Python0 0 00
FanhaiLu1/jetstream-pytorch
Language:Python0 0 00
FanhaiLu1/llama
Inference code for Llama models
Language:Python0 0 00
FanhaiLu1/maxtext
A simple, performant and scalable Jax LLM!
Language:Python0 0 00
FanhaiLu1/torch_xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
Language:C++0 0 00
FanhaiLu1/triton-flash-attention
Language:Python00
FanhaiLu1/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0 00