GoogleCloudPlatform/ai-on-gke

RAG frontend is getting OOMKilled

Closed this issue · 1 comments

Steps to reproduce:

  1. Scale down RAG frontend to 1 replica
  2. Send multiple prompts
  3. kubectl get pods to see that frontend pod is restarted due to OOM kill.

This is also possible to replicate when running multiple frontend replicas, but it just takes more prompts due to prompts being load balanced across the replicas.

I haven't seen this since merging #551 -- Should we close this @imreddy13 ?