RAG frontend is getting OOMKilled

Question

RAG frontend is getting OOMKilled

Closed this issue 6 days ago · 1 comments

Steps to reproduce:

Scale down RAG frontend to 1 replica
Send multiple prompts
kubectl get pods to see that frontend pod is restarted due to OOM kill.

This is also possible to replicate when running multiple frontend replicas, but it just takes more prompts due to prompts being load balanced across the replicas.

Answer 1 · 2024-04-05T03:21:20.000Z

I haven't seen this since merging #551 -- Should we close this @imreddy13 ?