ulixee/platform

Chrome instances not closing when running ulixee-cloud via docker

ven0ms99 opened this issue · 2 comments

Here are my findings regarding the problem of Chrome instances not closing when using ulixee-cloud via docker:

First of all, I completely reset my Ubuntu server used for ulixee-cloud exclusively (Ubuntu 22.04.2 LTS GNU/Linux 5.15.0-72-generic x86_64). I then precisely followed your instructions for running docker. The only differences here were:

• I had to install jq manually:

if ! which jq >/dev/null; then
  sudo apt-get -y install jq
fi

• I had to create missing folders on the server:

if [ ! -d "/tmp/.ulixee" ]; then
  mkdir "/tmp/.ulixee"
fi
if [ ! -d "/root/.cache/ulixee" ]; then
  mkdir "/root/.cache/ulixee"
  mkdir "/root/.cache/ulixee/datastores"
fi

Afterwards, I just pasted your script into a run.sh file an ran sh run.sh (as root).

I then proceeded to use my own script with this server. The fresh server setup did not help.

~ 10 minutes into scraping there were already 342 chrome instances, scraping was already slowed down a lot:
Screenshot 2024-04-15 at 10 36 26

~ 23 minutes into scraping there were 627 instances - at this point the server was barely usable:
Screenshot 2024-04-15 at 10 49 52

Next up, I proceeded to develop a minimal example that reproduces it, as requested by @blakebyrnes on Discord.

You can find it here. It behaves pretty much the same on the previously setup ulixee-cloud docker server. Same problem of chrome instances not closing.

I found out that it's enough to give hero.goto() an invalid url, so that it fails immediately. This way hundreds of Chrome instances can be spinned up in mere seconds.

Running @ulixee/cloud via node directly on the Ubuntu server works perfectly. Chrome instances are closing just fine. Switching to the dockerized version on the same server instantly leads to Chrome not closing. In both cases I was using the same code from here.

I’ve run into the same issue, and after a little digging, here’s what I’ve found:

  • Regardless of how v2.0.0-alpha.27 is started, be it by CLI or a custom node script, it spins up new Chrome instances indefinitely until it consumes all 256GB of RAM in your homelab.
  • The issue persists with --max-concurrent-heroes-per-browser=1 --max-concurrent-heroes=1 --disable-chrome-alive flags.
  • No errors or notable messages appear in the DEBUG logs, and all components seem to close normally: Browser, Page, Tab, BrowserContext, Socket close, MitmSocket, ProxyIpcHandler, MitmRequestSession, MitmProxy, Agent, Session
  • The good news: the issue isn't present inv2.0.0-alpha.25 and introduced in v2.0.0-alpha.26.

After diffing the logs, I did find one potentially significant difference, in v2.0.0-alpha.25, the following lines appear:

agentkeepalive sock[0#registry.npmjs.org:443::::::::true:::::::::::::](requests: 1, finished: 1) timeout after 15000ms, listeners 2, defaultTimeoutListenerCount 3, hasHttpRequest false, HttpRequest timeoutListenerCount 0 +15s
agentkeepalive timeout listeners: onTimeout, onTimeout +1ms
agentkeepalive sock[0#registry.npmjs.org:443::::::::true:::::::::::::] is free, destroy quietly +0ms
agentkeepalive sock[0#registry.npmjs.org:443::::::::true:::::::::::::](requests: 1, finished: 1) close, isError: false +1ms

However, there's no agentkeepalive in the v2.0.0-alpha.26 or v2.0.0-alpha.27 logs. The agentkeepalive package is a dependency, but I stopped digging here to report since agentkeepalive might be superfluous.

Version 28 did not make it to docker, which had the fix for this. Sorry!