Productionizing AI and LLM Apps with Ray Serve

ODSC West 2023

Overview

Once our AI/ML models are ready for deployment, that's when the fun really starts. We need our AI-powered services to be resilient and efficient, scalable to demand and adaptable to heterogeneous environments (like using GPUs or TPUs as effectively as possible). Moreover, when we build applications around online inference, we often need to integrate different services: multiple models, data sources, business logic, and more.

Ray Serve was built so that we can easily overcome all of those challenges.

In this class we'll learn to use Ray Serve to compose online inference applications meeting all of these requirements and more. We'll build services that integrate with each other while autoscaling individually, even supporting individual hardware and software requirements -- all using regular Python and often with just one new line of code.

Motivating Scenario: Multilingual LLM Chat

For our example use case, we’ll see how to leverage Ray Serve to host a LLM Chat model and how to enhance it using additional services for multilingual interactions.

Learning Outcomes

Develop an understanding of the various architectural components of Ray Serve.
Use deployments and deployment graphs API to serve machine learning models in production environments for online inference.
Combine multiple models to build complex logic, allowing for a more sophisticated machine learning pipeline.

Topics discussed

Context of Ray Serve
Deployments
Service resources (e.g., CPU/GPU/...)
Runtime environments and dependencies
Composing deployments to build more complex applications
Architecture / Under-the-hood
Scaling, Performance, Batching, and more production patterns

Connect with the Ray community

You can learn and get more involved with the Ray community of developers and researchers:

Ray documentation
Official Ray site Browse the ecosystem and use this site as a hub to get the information that you need to get going and building with Ray.
Join the community on Slack Find friends to discuss your new learnings in our Slack space.
Use the discussion board Ask questions, follow topics, and view announcements on this community forum.