inferentia
There are 10 repositories under inferentia topic.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
aphrodite-engine/aphrodite-engine
Large-scale LLM inference engine
aws-samples/foundation-model-benchmarking-tool
Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
aws-solutions-library-samples/guidance-for-machine-learning-inference-on-aws
This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It addresses the basic implementation requirements as well as ways you can pack thousands of unique PyTorch deep learning (DL) models into a scalable architecture and evaluate performance
aws-samples/aws-inferentia-huggingface-workshop
CMP314 Optimizing NLP models with Amazon EC2 Inf1 instances in Amazon Sagemaker
aws-samples/awsome-fmops
Collection of bet practices, reference architectures, examples, and utilities for foundation model development and deployment on AWS.
daekeun-ml/aws-inferentia
This repository provides an easy hands-on way to get started with AWS Inferentia. A demonstration of this hands-on can be seen in the AWS Innovate 2023 - AIML Edition session.
DarkSector/inf1-sentence-transformers
Sentence Transformers on EC2 Inf1
windson/inferentia-deployments
Deploy Large Models on AWS Inferentia (Inf2) instances.
yahavb/coldstart-recs-on-aws-trainium
End-to-end solution for cold-start recommendations using vLLM, DeepSeek Llama (8B & 70B), and FAISS on AWS Trainium (Trn1) with the Neuron SDK and NeuronX Distributed. Includes LLM-based interest expansion, embedding comparisons (T5 & SentenceTransformers), and scalable retrieval workflows.