wundergraph/cosmo

High memory utilization after router upgrade to 0.91.0

Closed this issue · 11 comments

Component(s)

router

Component version

0.91.0

wgc version

latest

controlplane version

latest

router version

0.91.0

What happened?

After upgrading from router version 0.84.4 we observe much higher memory utilization of the router instances. Initially after the upgrade all instances were OOMKilled because the pod hit the memory limits. We are running the router in a k8s cluster so here are the resources before the update which were fine for the older router version:

resources:
   requests:
        cpu: 400m
        memory: 256Mi
   limits:
        cpu: 1000m
        memory: 600Mi

Now memory utilization increased to around 1000Mi. Do you have an explanation for this?

Environment information

Environment

Kubernetes 1.25

Router configuration

No response

Router execution config

No response

Log output

No response

Additional context

No response

WunderGraph commits fully to Open Source and we want to make sure that we can help you as fast as possible.
The roadmap is driven by our customers and we have to prioritize issues that are important to them.
You can influence the priority by becoming a customer. Please contact us here.

We're taking a look

Hi @PeteMac88
Could you check with the 0.90.0 version?

Hey @devsergiy, es definitely higher as the 0.84.4 version but not as high as 0.91.0. Here is a table with the average utilization with same load:

0.84.4 ~ 350Mi
0.90.0 ~ 700 Mi
0.91.0. ~ 1200Mi

Thanks a lot

Could you also check 0.88.0?

We already know the reasons why it changes in 0.90.1
But it is a bit of a surprise that it also spikes before that on 0.90.0

@devsergiy Any update here? I needed to downgrade the router yesterday because of the increasing memory consumption.

@PeteMac88 We are aware of the issue, and we will address it in due course.

Thank you for your patience!

Moreover, were you able to check 0.88.0 as requested @PeteMac88?

Are you able to build the router from source like so:

go build -tags=pprof main.go

Then run the router with your regular traffic and run the following command:

go tool pprof http://localhost:6060/debug/pprof/heap

You need to have the go toolchain installed for this.

Once you've got a heap profile, can you upload it here for investigation? Thanks

I also ran into this problem when upgrading to 0.95.6 this week. I observed unbounded memory growth when load testing the router with our batch requests > 100 list items and ~ 150 fields across 2 subgraphs (No deep nesting & max 2 levels).

I did not run into this problem with the same queries that only returned a single item.

Reverting to 0.88.0 definitely helpled and the containers use less memory now.

@PeteMac88 @jfroundjian please reopen the issue if the problem persists with the latest router router@0.105.2. Thank you!