Pinned Repositories
inference
Reference implementations of MLPerf™ inference benchmarks
JetStream
A throughput and memory optimized engine for LLM inference on TPU and GPU!
jetstream-pytorch
llama
Inference code for Llama models
maxtext
A simple, performant and scalable Jax LLM!
torch_xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
FanhaiLu1's Repositories
FanhaiLu1/inference
Reference implementations of MLPerf™ inference benchmarks
FanhaiLu1/JetStream
A throughput and memory optimized engine for LLM inference on TPU and GPU!
FanhaiLu1/jetstream-pytorch
FanhaiLu1/llama
Inference code for Llama models
FanhaiLu1/maxtext
A simple, performant and scalable Jax LLM!
FanhaiLu1/torch_xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)