/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Primary LanguagePythonApache License 2.0Apache-2.0

Issues