EU-EDPS/website-evidence-collector

Bug: program gets stuck, not responding

Lucas-C opened this issue · 2 comments

Tested under Windows, using v0.4.0

There is the command reproducing the issue systematically:

website-evidence-collector --headless=no --overwrite --yaml https://chezsoi.org/lucas/blog/
{"type":"Browser","level":"info","message":"browsing now to https://chezsoi.org/lucas/blog/","timestamp":"2020-02-01T16:59:55.237Z"}

The browser window opens and properly loads the website, but then nothing happens, even after waiting several minutes..

I do not always have this issue, it works well with other websites I tested.

Another observation:

  • with this command, all works fine:
    website-evidence-collector --headless=no --overwrite https://jobs.oui.sncf
  • with this one, it gets stuck:
    website-evidence-collector --overwrite https://jobs.oui.sncf
    or I get this error:
    (node:14232) UnhandledPromiseRejectionWarning: TimeoutError: waiting for target failed: timeout 30000ms exceeded

Der @Lucas-C ,

I can confirm that there is an issue from my Linux computer running the latest master version:

website-evidence-collector --no-headless --overwrite --yaml https://chezsoi.org/lucas/blog/                       rriemann@edps-lab-suse
{"type":"Browser","level":"info","message":"browsing now to https://chezsoi.org/lucas/blog/","timestamp":"2020-02-05T10:04:55.373Z"}
(node:847) UnhandledPromiseRejectionWarning: Error: Protocol error (Page.captureScreenshot): Unable to capture screenshot
    at Promise (/opt/inspection/website-evidence-collector/node_modules/puppeteer/lib/Connection.js:183:56)
    at new Promise (<anonymous>)
    at CDPSession.send (/opt/inspection/website-evidence-collector/node_modules/puppeteer/lib/Connection.js:182:12)
    at Page._screenshotTask (/opt/inspection/website-evidence-collector/node_modules/puppeteer/lib/Page.js:951:39)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at Page.<anonymous> (/opt/inspection/website-evidence-collector/node_modules/puppeteer/lib/helper.js:111:15)
    at /opt/inspection/website-evidence-collector/website-evidence-collector.js:278:16
    at process._tickCallback (internal/process/next_tick.js:68:7)
(node:847) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:847) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

There is no problem if I comment out the following line responsible for taking an entire page screenshot (exceeding potentially the viewport of the browser):

await page.screenshot({path: path.join(argv.output, 'screenshot-full.png'), fullPage: true});

This problem may be linked to a limitation of the puppeteer library that we depend on:

The easiest workaround I can imagine is to catch the error and continue without having such an entire page screenshot. Better would be to fix the root of the issue, but I do not know how this can be achieved.