thomasdondorf/puppeteer-cluster

browser crash on a large website

mittster opened this issue · 7 comments

I came a across a website that puppeteer can't handle. When making screenshot, Protocol error (Runtime.callFunctionOn): Target closed or Protocol error (Emulation.setDeviceMetricsOverride): Target closed is triggered. Before taking a screenshot, I scroll it so that all images are loaded. The page is large so I set '--disable-dev-shm-usage','--shm-size=3gb', params in hope to prevent any memory issues. This is minimalisic sample code with url included. Any idea why the page is closed in the middle of the operation? In addition to puppeteer-cluster ("^0.23.0"), I am also using puppeteer-extra-plugin-stealth("^2.9.0") and puppeteer-extra("^3.2.3")

 import puppeteer from 'puppeteer-extra';
 import {Cluster} from 'puppeteer-cluster';
 import StealthPlugin from "puppeteer-extra-plugin-stealth";
 puppeteer.use(StealthPlugin());

const cluster = await Cluster.launch({
    puppeteer,
    concurrency: Cluster.CONCURRENCY_CONTEXT,
    maxConcurrency: 1,
    puppeteerOptions:{
        headless : true,
        args: [
                '--disable-setuid-sandbox',
                '--no-sandbox',
                '--window-size=1920,1080',
                '--disable-dev-shm-usage',
                '--shm-size=3gb',
        ]
    }
   
  });

  await cluster.task(async ({ page, data: url }) => {
    let response = await page.goto(url, { waitUntil:"networkidle2" });
    await Screenshot(page, screenshotPaths);  
  
  });




async function autoScroll(page){
    await page.evaluate(async () => {
        try {
            await new Promise((resolve, reject) => {
                let totalHeight = 0;
                let distance = 389;
                let counter = 0;
                let timer = setInterval(() => {
                    counter++;
                    var scrollHeight = document.body.scrollHeight;
                    window.scrollBy(0, distance);
                    totalHeight += distance;

                    if((totalHeight >= scrollHeight - window.innerHeight) || counter > 100){
                        clearInterval(timer);
                        resolve();
                    }
                }, 50);
            });

        }catch (e) {
            console.log("we got scrolling error:");
            console.log(e);
        }
    });
}


async function Screenshot(page) {  

    let save = true;
    try {
        await page.waitForTimeout(6000);
        await page.setViewport({ width:390, height:844});
        await autoScroll(page);
        await page.evaluate(() => window.scrollTo(0, 0));
        await page.waitForTimeout(2000);
        if(save) await page.screenshot({path: "./mobile.jpg",  fullPage: true});

        await page.setViewport({ width:1920, height:1080});
        await autoScroll(page);
        await page.evaluate(() => window.scrollTo(0, 0));
        await page.waitForTimeout(2000);
        if(save) await page.screenshot({path: "./desktop.jpg",  fullPage: true});
    }catch(error) {
        console.log("we got screenshot error");
        console.log(error);
    }
}


  cluster.queue("https://www.sinsay.com/si/sl/sale/woman/view-all-clothes");
  await cluster.idle();
  await cluster.close();

Stack trace:

ProtocolError: Protocol error (Runtime.callFunctionOn): Target closed.
    at /path/to/puppeteer/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:230:24
    at new Promise (<anonymous>)
    at CDPSession.send (/path/to/puppeteer/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:226:16)
    at next (/path/to/puppeteer/node_modules/puppeteer-extra-plugin-stealth/evasions/sourceurl/index.js:32:41)
    at CDPSession.send (/path/to/puppeteer/node_modules/puppeteer-extra-plugin-stealth/evasions/sourceurl/index.js:65:16)
    at ExecutionContext._evaluateInternal (/path/to/puppeteer/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:204:50)
    at ExecutionContext.evaluate (/path/to/puppeteer/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:110:27)
    at DOMWorld.evaluate (/path/to/puppeteer/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:123:24)
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
  originalMessage: ''
}

I have the same error and haven't been able to resolve it. The problem occurs if the browser has to wait for more than about 10 seconds whether it is waiting for a page to load or even a timeout function.
What version of puppeteer are you using?

Thanks for feedback. I am using puppeteer-cluster ("^0.23.0"), puppeteer-extra-plugin-stealth("^2.9.0") and puppeteer-extra("^3.2.3"). Not sure which puppeteer is in turn used by the modules.

This particular website is a source of all kind of puppeteer problems. Check this out: puppeteer/puppeteer#8665

Temporary fix until puppeteer-cluster gets updated, just so you know. I was able to fix this by not using puppeteer-cluster and using async generator function instead. Also I used chrome "record actions" function to generate puppeteer code this time.

@zasnool
Thanks. I also managed to get it working without cluster. The problematic part is the scrolling. If I omit autoScroll, then it works as expected. Unfortunately, scrolling is mandatory for many pages.

ClusterOptionsArgument accepts a timeout parameter. default is 30s. this is why it crashes on large websites but seemingly work fine on small ones.