Puppeteer Page Event - 'domcontentloaded', 'load', 'networkidle0', 'networkidle2'

Question

Puppeteer Page Event - 'domcontentloaded', 'load', 'networkidle0', 'networkidle2'

tomgallagher opened this issue 7 years ago · 8 comments

Hey,

I've had a look at the code you're using to extract the styles (run.js) and I have a question / comment.

My understanding is that CSS styling tends to be render-blocking, whether it is in the form of a style tag, a link to an external style sheet, or inline styling on an element.

There is of course additional styling done by Javascript as and when that loads, whether it be render-blocking or async / defer.

From a performance point of view, there are two things that could usefully be done by a minimal css project. Firstly, identify the minimal css required above the fold to make sure that the first paint is properly styled. Make sure this is added to a style tag in the head somehow. Secondly, make sure that the total volume of css is only that required by the page, i.e. remove redundant selectors and their respective css rules from the download package.

Now, it seems to me that both the above could be achieved by waiting only for the 'domcontentloaded' event in Puppeteer. This event is quicker and more reliable in the wild than the 'load' event and the two 'networkidle' events. By definition, you can then use response.text() on a external stylesheet response to get all the rules and page.$$eval to assess those rules.

Seems to me to be a quicker and more reliable way to get the above-the-fold and present/not present elements and then divide the styling accordingly. Any styling applied by Javascript can be ignored.

Answer 1 · 2018-05-23T14:50:54.000Z

So your only concern is what event used to detect finish of load e.g 'domcontentloaded' vs 'load' vs 'networkidle0' vs 'networkidle2' etc. Or you also asking for additional features like ("Make sure this is added to a style tag in the head somehow."). I would suggest to separate different requests in different tickets.

About additional feature - there can be another software which can do this based on minimalcss, for example react-snap does it. It is out of scope of current project.

About which event to use. It depends on situation, there are situations which require 'networkidle0', but I guess this it can be configurable (not sure if config exposed or not)

Answer 2 · 2018-05-23T15:22:38.000Z

Yes, sorry about the post, it was more of a general discussion than it should have been.

I guess the basic question is: when would you need to wait for any event other than 'domcontentloaded' in order to get the styles for any given page?

Answer 3 · 2018-05-23T15:27:23.000Z

We (@stereobooster and I) debated this a lot in the past and concluded that the best thing is to just wait for networkidle0)

The reason being is many sites have some .css links and some .js (or inline Javascript for that matter) that waits for window.onload (or some incantation of it) and does JavaScript things. For example, if it used jQuery:

$(function() {
  // As soon as the page has loaded, extract all <h2> and <h3> tags and
  // generate a cute little Table of Contents near the top of the page.
  buildAndInsertTableOfContents($('.document h2, .document .h3'), $('.document .toc'));
});

That's just DOM manipulations and doesn't depend on the network. In the example, if the Table of Contents depends on some like .document .toc ul li selector in a .css file, we're want to make sure we're waiting for that to download and for the DOM to morph.

Another common thing is that on window.onload some sites fetch more stuff. For example:

// React example

componentDidMount {
  // Now that the page has rendered, download the related 
  // news (which is less critical) and display in the side-bar.
  fetch('/api/related-news').then(r => r.json())
  .then(news => this.setState({relatedNews: news}))
};

In both of these examples, by using networkidle0 we make sure we get the FINAL CSSOM after the page has "fully loaded" and calmed down.

It's just safer that way. If we try to extract minimalcss earlier, what MIGHT happen is that (as the per last example above) the related news is displayed in the side-bar without nice styling.

We recommended that you replace the render blocking CSS with the minimal/critical CSS but still load the rest of all the CSS later.

Answer 4 · 2018-05-23T15:29:30.000Z

Suppose that the DOM changes a bunch between domcontentloaded and networkidle0 (basically means in a browser when the loading icon spinner stops to spin) then it's likely to be very little. For example, in the example of the related news in the side-bar, that extra CSS is likely to be a less than 100 bytes. That extra weight to the inline <style> block isn't expensive. Especially when the HTML document is gzipped. However, the "cost" of having to look at unstyled lazy-loaded DOM elements is higher.

Answer 5 · 2018-05-23T15:37:44.000Z

Note that there's another interesting thing. There is a time between domcontentloaded and networkidle0.
Imagine the HTML from the server looks something like this:

<div class="toc toc-pending">Table on contents will be inserted here</div>

...together with...

.toc-pending {
  opacity: 0.7;
  color: #efefef;
}

And the .js file does something like this:

$(function() {
  // As soon as the page has loaded, extract all <h2> and <h3> tags and
  // generate a cute little Table of Contents near the top of the page.
  buildAndInsertTableOfContents($('.document h2, .document .h3'), $('.document .toc'));
  $('.document .toc').removeClass('toc-pending');
});

I.e. the DOM might look/styled a certain way that only lasts a couple of hundred milliseconds whilst the user is waiting for the lazy-loaded .js to change the DOM.

We're deliberately missing that opportunity. I.e. minimalcss won't include the .toc-pending CSS selector that was only used temporarily whilst waiting for the whole page to stop loading. Notes in the code here

Answer 6 · 2018-05-23T16:58:12.000Z

OK. The reason I asked these questions is that I have done a lot of testing of Puppeteer in the wild and the networkidle0 event causes a huge number of timeout errors. In fact, so does the networkidle2 event. If you're collecting the CSS text as you go along, using the response.text() approach, then that should be OK as you have the info you need when the error is raised.

Answer 7 · 2018-05-23T20:01:54.000Z

Totally a valid point.

One argument that was made previously is that if you're so performance-hungry that you even use a tool like this you probably don't have a bunch of weird requests that timeout. You're likely to have bigger problems.

But we can wait and see. If the above statement is wrong, we could consider an option where you can actually control the thing to waitfor. E.g. "I really know what I'm doing and I know that only junk, that don't affect the first-render CSS, is going to time out."

Answer 8 · 2018-05-25T11:58:16.000Z

Should we close this as non-actionable ticket?