RAG frontend is getting OOMKilled
Closed this issue · 1 comments
andrewsykim commented
Steps to reproduce:
- Scale down RAG frontend to 1 replica
- Send multiple prompts
kubectl get pods
to see that frontend pod is restarted due to OOM kill.
This is also possible to replicate when running multiple frontend replicas, but it just takes more prompts due to prompts being load balanced across the replicas.
andrewsykim commented
I haven't seen this since merging #551 -- Should we close this @imreddy13 ?