puppeteer/puppeteer

Page.setContent should wait for resources to be loaded

aslushnikov opened this issue ยท 39 comments

(as mentioned in #486 and other places)

We need a way to wait for page to load all the resources after the page.setContent.

The lifecycle events might help help.

Meanwhile. a good workaround for page.setContent that waits for all the resources to load:

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });

It's amazing that you posted the workaround. I'm running exactly into this issue, attempting to use puppeteer as a PDF generating service from HTML.

Thank you for filing a formal issue.

I'm having trouble with this too on version 0.13-alpha with browser configs as:

"waitUntil": "networkidle2",
"timeout": 60000

I confirm the same trouble.

with the latest release this hangs too:

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle' });

wait for networkidle0 instead

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });

Hey @aslushnikov

You said in #1312 to wait for https://chromium-review.googlesource.com/c/chromium/src/+/747805
The patch is merged...

Everything is OK to solve this issue?

Cheers!

@HanXHX this requires more work upstream: in order to reuse lifecycle events, page.setContent should initiate a navigation, which in turn should be plumbed through browser-side navigation aka "plznavigate".

The workaround doesn't seem to work for me. Using 1.0.

For reference,

      const loaded = page.waitForNavigation({
        waitUntil: 'load',
      });

      await page.setContent(html);
      await loaded;

seems to work for me.

My use case is that background-color css isn't loading. I'll give the above a try as well.

@LeonineKing1199 I got navigation exceeded timeout using the above method.

Darn, it worked for me. Sorry it didn't for you :P

Did you copy-paste it verbatim in that example? You need to register the event callback before you invoke setContent

Yeah verbatim

@aslushnikov : the workaround does'n't work correctly when encoding is not set on the <head> of the HTML.
With setContent the accent are correctly set.
Repro :

 const html = `<html><head></head><body>A word with ร ccรชnt</body></html>`

 await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });
 const pdf_buffer = await page.pdf(options)
 await page.close()

image

Any idea when the setContent with waitForNavigation will work again ?

Thanks.

Iยดm also having some problems.

page.goto(data:text/html,${html}, { waitUntil: 'networkidle0' });

This does not load images from remote urls. Any suggestions ?

Any idea when the setContent with waitForNavigation will work again ?

@GautierT the feature is not in our plans for this quarter so there are no estimates. We'll see if we have some time to have this covered.

@GautierT an experimental implementation was landed once; it got reverted later.

@aslushnikov : ok thanks for the explanation.

Any idea how can i convert HTML string to PDF without having trouble with encoding ?

With await page.goto(data:text/html,${html}, { waitUntil: 'networkidle0' });the accent are badly handled if <meta charset="UTF-8"> is not set in the header.

But i can't put it myself (users send me HTML string to be converted to PDF).

Thanks for your help.

@GautierT there's another trick you can try that involves request interception.

const page = await browser.newPage();
await page.setRequestInterception(true);
// Capture first request only
page.once('request', request => {
  // Fulfill request with HTML, and continue all subsequent requests
  request.respond({body: myHTML});
  page.on('request', request => request.continue());
});
await page.goto('http://example.com');

@GautierT Did you find a way to fix the encoding issue without having to put the meta in head? Facing the same issue

You can specify charset in the data url e.g. data:text/html;charset=UTF-8,<h1>๐Ÿ‘</h1>.

This seems to have fixed the issue with emoji for me.

Hey guys, I've had the same issue with setContent. Using page.goto worked, but when you have a huge html it just don't stop rendering. One possible workaround and that's what i'm using now: Save the into a file and use the file protocol on goto.

something like:

await page.goto("file://" + tempFilePath)

The following code from above posts

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });

does not work for me anymore. Tested with Puppeteer v1.6.1.

Any workaround or news regarding this functionality?

@jmanuelbr try with this

await page.goto("data:text/html," + html, {
      waitUntil: "networkidle2"
    });

#728 (comment) doesn't appear to work anymore for version v1.8.0.

I tried:

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });

and

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle2' });

So I just reverted back to v1.1.1 since that's what worked for me prior to trying v1.8.0. Didn't get a chance to test version between v1.1.1 and v1.8.0.

#728 (comment) doesn't appear to work anymore for version v1.8.0.

@jeremypeter what do you mean by "doesn't work"? I just tried, works fine for me.

Hi @aslushnikov

Regarding this #728 issue, i've been using the initial workaround you've proposed for a while :

await page.goto('data:text/html,${html}', { waitUntil: 'networkidle0' });

But now i'm encountering users cases producing HTML to big for that data URI trick (2Mo limitation)

So far, I've used the workaround mentioned by @filipemonteiroth the 14th of June, creating a temporary file to be serve thru page.goto, because my process need to wait for both network stabilisation with networkidle0 and for a browser side JS script completion custom event.

But this workaround necessitate to manage the lifetime of this new file ...

As I've spotted your other trick that involves request interception today, I'm wondering if this would enable me to pass HTML bigger than 2Mo to chromium while having both features of networkIdle0 and custom events working, all of this without having to manage lifecycle of a newly created temporary file ?

Is there any limitations related to this trick I should be aware of before using it ?

As I've spotted your other trick that involves request interception today, I'm wondering if this would enable me to pass HTML bigger than 2Mo to chromium while having both features of networkIdle0 and custom events working, all of this without having to manage lifecycle of a newly created temporary file ?
Is there any limitations related to this trick I should be aware of before using it ?

@Mumeii I'm not aware about limitations; it should just work.

As I've spotted your other trick that involves request interception today, I'm wondering if this would enable me to pass HTML bigger than 2Mo to chromium while having both features of networkIdle0 and custom events working, all of this without having to manage lifecycle of a newly created temporary file ?

I'm running the same issue that handling another newly created file. I think this is not the best practice since you might run into the race condition when concurrent requests come at once, the hard disk keeps receiving many write requests.

I'm wondering if there is any workaround such as streaming files instead of saving file and use "file//" protocol

Hi Team!

Please advise at the WO is not working in our case. (puppeteer 1.9.0)
I am trying to convert the XHTML content. I am providing XHTML content as an excaped inlined string.
The generated document contains raw XHTML and there are no external resources requests (in this case CSS).

Example

'use strict'

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch()
    const page = await browser.newPage()
    await page.setRequestInterception(true)

    const xhtml = `&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Transitional//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt; &lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt; &lt;head&gt; &lt;meta charset=utf-8&quot;/&gt; &lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;application/xhtml+xml; charset=utf-8&quot;/&gt; &lt;link href=&quot;css.css&quot; media=&quot;all&quot; rel=&quot;stylesheet&quot; type=&quot;text/css&quot;/&gt; &lt;/head&gt; &lt;body&gt; some content &lt;/body&gt; &lt;/html&gt;`

    console.log(xhtml)

    page.on('request', request => {
        console.log(`Intercepted request with URL: ${request.url()}`)
        request.continue()
    });

    await page.goto(`data:text/html,${xhtml}`, {
        waitUntil: 'networkidle0'
    });
    await page.pdf({
        path: 'xhtml.pdf'
    })
    await browser.close()
})()

Here is the initial document content

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <meta charset=utf-8"/>
      <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
      <link href="css.css" media="all" rel="stylesheet" type="text/css"/>
   </head>
   <body>
      some content 
   </body>
</html>

Is there a way to set the url if we end up using the data:text/html workaround? Do relative paths for resources work via this method?

I had the exact same problem with external resources. So the workaround from @aslushnikov helped me a lot. But as @ObviouslyGreen points out it lacks the support of resolving relative paths. I investigated what puppeteer takes as "url" when using this workaround and it is the whole html (obviously).

I could solve the problem with relative paths (for me in CSS styles) with the following approach:

  1. create a folder (let's name it dist) in which all relative resources are placed in
  2. generate the html as needed (paths should be relative to the root of dist)
  3. write the html file to the root of dist
  4. use the following code to load the html with all the relative resources resolved correctly:
const pathToHtml = path.join(__dirname, 'dist', `${randomName}.html`);

const page = await browser.newPage();
await page.goto(`file:${pathToHtml}`, { waitUntil: 'networkidle0' });

Note that the html file needs to have the '.html' suffix for puppeteer to render the html properly (at least for me this was the case).

@aslushnikov Great to see the same options for page.setContent!

Is it possible now with page.setContentto load resources with relative paths as I described in my workaround in the above comment?

@kamekazemaster yeah, the paths should be resolved against the page's URL.

await page.goto('https://example.com');
// logo.png becomes https://example.com/logo.png
await page.setContent('<img src="/logo.png"></img>');

@aslushnikov when can we expect next release with updated setContent?

@kamekazemaster yeah, the paths should be resolved against the page's URL.

await page.goto('https://example.com');
// logo.png becomes https://example.com/logo.png
await page.setContent('<img src="/logo.png"></img>');

This solution is really helpful, though if described with little bit details might have saved some time.

Thank you.

@kamekazemaster yeah, the paths should be resolved against the page's URL.

await page.goto('https://example.com');
// logo.png becomes https://example.com/logo.png
await page.setContent('<img src="/logo.png"></img>');

Can anyone confirm if adding a <base href=...> tag to the html works instead of the initial page.goto above ?