josdejong/workerpool

Guidance on Restarting a Specific Service in Workerpool to Handle Memory Leaks in Playwright

wojtekKrol opened this issue · 3 comments

Description

I am using the workerpool library to manage multiple services in a Node.js application, specifically for crawling tasks using Playwright. However, I've encountered an issue with Playwright related to memory leaks. This seems to be a common problem among developers using Playwright, and the suggested workaround involves restarting the Playwright process to free up memory.

Issue

In my application, each service is a separate worker within workerpool. One of these services, a crawler, is responsible for handling thousands of URLs. Due to the memory leak in Playwright, I need a way to programmatically restart this specific service (crawler) within workerpool. The service is stateless and does not process any data persistently, so it should be feasible to restart it without losing important information.

Current Implementation

Here is a simplified version of how the services are structured:

// Main file
import path from 'path';
import { fileURLToPath } from 'url';
import { pool } from 'workerpool';
import { runApiServer } from '~/api/api.js';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const computeWorkersCount = (
  name: AppWorker
): [min: number, max: number] => {
  return [
    Number(CONFIG[(name + '_WORKERS_MIN') as AppWorkersMin]),
    Number(CONFIG[(name + '_WORKERS_MAX') as AppWorkersMax]),
  ]
}

const main = async () => {
  // Database initializations
  const oneDB = createOneDB();
  const anotherDB = createAnotherDB();

  runApiServer({ oneDB , anotherDB });

  const services = [
    [computeWorkersCount('FIRST'), './services/one'],
    [computeWorkersCount('SECOND'), './services/second'],
    [computeWorkersCount('THIRD_LEAK_MEMORY_PROBLEM'), './services/third'],
  ];

  for (const [[min, max], servicePath] of services) {
    pool(path.join(__dirname, servicePath), {
      minWorkers: min,
      maxWorkers: max,
    })
      .exec('main', null)
      .catch(console.error);
  }
};

main();

// Example of a service worker
import { worker } from 'workerpool';

worker({
  main: () =>
    main({
      oneDB: createOneDB(),
      anotherDB: createAnotherDB(),
    }),
});

Request

I am seeking guidance or a feature within workerpool that would allow me to restart a specific service (especially the crawler service using Playwright) to handle the memory leak issue. This would involve terminating and then reinitializing the service's process. Any suggestions or solutions for this scenario would be greatly appreciated.

I guess you can call .terminate() on the workerpool to kill all workers, and then create a new workerpool.

@josdejong I would like to make that logic, that worker process logic inside it (or best repeat it N times), and after that it will be terminated and re-created (with reset N counter) automatically.

I think what you can do is create a little wrapper function around your workerpool that:

  • creates a workerpool instance
  • keeps track on the number of executed tasks (divided by the number of workers to get your N tasks per worker)
  • once the max executed tasks is reached, gracefully shutdown the pool and create a new one

There is no support for terminating a single worker, but this would terminate all of them and re-create them once in a while to solve the memory leaks issue.