apify/crawlee

SessionPool's throws memory leark warning and hangs playwright crawler

harm-matthias-harms opened this issue · 2 comments

Which package is this bug report for? If unsure which one to select, leave blank

None

Issue description

When I use the session pool's maxAgeSecs and a pool size over 20 the following error is thrown. Afterward, my crawler hangs up, which is possibly a local implementation issue. The problem can be prevented, by either setting the session pool size <= 20 or removing the max-age secs.

I use the official docker image.

Error:
MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 21 sessionRetired listeners added to [SessionPool]. MaxListeners is 20. Use emitter.setMaxListeners() to increase limit

Code sample

// throws error
sessionPoolOptions: {
    maxPoolSize: 100,
    sessionOptions: {
      maxErrorScore: 1,
      maxUsageCount: 5,
      maxAgeSecs: 180,
    },
  },

// throws error
sessionPoolOptions: {
    maxPoolSize: 75,
    sessionOptions: {
      maxErrorScore: 1,
      maxUsageCount: 5
    },
  },

// throws error but doesn't hang up
sessionPoolOptions: {
    maxPoolSize: 50,
    sessionOptions: {
      maxErrorScore: 1,
      maxUsageCount: 5
    },
  },

// is fine
sessionPoolOptions: {
    maxPoolSize: 20,
    sessionOptions: {
      maxErrorScore: 1,
      maxUsageCount: 5,
      maxAgeSecs: 180,
    },
  },

// other options I use
maxRequestsPerCrawl: 3000,
maxConcurrency: 15,
minConcurrency: 1,
launchContext: {
  useChrome: true,
  browserPerProxy: true,
  useIncognitoPages: true,
},

Package version

3.10.5

Node.js version

apify/actor-node-playwright-chrome:22-1.44.0

Operating system

Linux

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

Hello and thank you for your interest in this project!

Can you please share a minimal reproducible example with us? While I tried piecing together a repro from the code blocks from the issue, I didn't manage to reproduce the issue you're mentioning.

By the way, the browserPerProxy and useIncognitoPages options do cause decreased performance (and therefore are false by default). The hangups could be simply caused by Crawlee launching too many browser instances - you can try to lower the concurrency by setting the maxConcurrency crawler constructor parameter to a lower value.

I tried to build a reproducible sample but could not reproduce it with a freshly generated project.

I found a workaround, but it may be related to some extension we built and not using Crawlee in the "normal" way. So probably it's working fine in the normal way.

If it happens again and I find more information, I will reopen the issue.