lrlna/puppeteer-walker

Navigation Timeout Exceeded

Opened this issue ยท 2 comments

First off, thanks for your work on this package ๐ŸŽ‰. I'm running into the small error below but aside from this I found it simple and very easy to use.

When using the page.waitFor method (e.g. page.waitFor(100)), the following error is thrown:

Error walking site:  Error: Navigation Timeout Exceeded: 30000ms exceeded
    at Promise.then (/[...]/puppeteer-walker/node_modules/puppeteer/lib/NavigatorWatcher.js:69:21)
    at <anonymous>

And I think there's a variety of other scenarios where the same error might crop up (e.g. #5). The odd thing is, I think the script actually is not going past the 30 second mark but somehow the end or page events aren't completed properly. So it would be nice to debug and solve that issue and it also might be useful if there was some initial configuration that would allow extending that timeout, e.g.

let walker = Walker({
  navigationTimeout: 50000 // Defaults to 30,000
})

Another approach might be to encourage users to duplicate the page object and use the duplicate since modifying the base instance causes issues. Any suggestions on the best way to clone this object would be much appreciated, I looked for something like page.duplicate() but didn't see anything built into puppeteer.

Ok, so the one clone/duplicate approach I tried feels a bit hacky but (kind of) helped at least for link modification:

walker.on('page', async page => {
    const url = await page.url()
    const browser = await Puppeteer.launch()
    const clone = await browser.newPage()

    await clone.goto(url, {
        waitUntil: 'networkidle2'
    })
    
    let title = await clone.title()

    // Disable all links
    await clone.$$eval('a', links => {
        links.forEach(link => link.href = '#')
    })

    // Generate the pdf
    await clone.pdf({
        path: Path.resolve(output, `./${title}.pdf`),
        format: 'A4',
        printBackground: true
    })

    await browser.close()
})

I used Puppeteer directly to create a clone and navigated it to the same url as page. This allows link modifications (doing them directly on page breaks crawling) but I'm still seeing the Navigation Timeout Exceeded error.

Possibly related: puppeteer/puppeteer#1908