Production Web Apps Performance Study Q4/16 - Q1/17

Question

Production Web Apps Performance Study Q4/16 - Q1/17

addyosmani opened this issue 8 years ago · 2 comments

Goals

Understand the cost of JS Parse/Compile/FunctionCall times on apps
Discover how quickly apps become interactive on average mobile hardware
Learn how much JavaScript apps are shipping down on desktop + mobile

Sample information
6000+ production sites using one of React, Angular 1.0, Ember, Polymer, jQuery or Vue.js. Site URLs were obtained from a combination of Libscore.io, BigQuery, BuiltWith-style sites, framework wikis. Sample sets were 10% eye-balled to verify usage of frameworks. Sets not reliable were discarded from the final study.

URLs: https://docs.google.com/a/google.com/spreadsheets/d/1_gqtaEwjoJGbekgeEaYLbUyR4kcp5E7uZuMHYgLJjGY/edit?usp=sharing

Trivia: All in all, 85,000 WebPageTest results were generated as part of this study. Yipes.

Tools used in study
WebPageTest.org (with enhancements such a JS cost, TTI, aggregated V8 statistics added thanks to Pat Meenan as the project progressed), Catapult (internal Google tool), Chrome Tracing.

Summary observations

This data may be useful to developers as it shows:

Real, production apps using their favorite stacks can be much more expensive on mobile than they might think.
Closer attention to lower parse times and time-to-interactive points is likely required if you're choosing something off the shelf.
Some, but not all, apps are shipping larger bundles. Where this is the case invest in code-splitting & reducing how much JS is used.

Where are the medians and aggregates?

The primary goals of this study were to highlight trends looking at the different data-sets available to me as a whole. Initially, I focused on summarizing this data at a per-framework level (e.g React apps in set 1 exhibited characteristic A). After reviewing this with the Chrome team, we decided presenting per-framework breakdowns were more susceptible to the takeaway being "oh, so I should just use framework X over Y because it is 2% better" instead of the important takeaways about parse/compile being a problem we all face.

To that end, the below charts are generated locally by fetching each of the WebPageTest reports for data-sets, iterating over a particular dimension (e.g Time-to-interactive, JS parse time) and getting the medians for different sets that are then plumbed into either Google Sheets or Numbers for charting. If you wish to recreate that setup yourself, you can grab the CSVs from the below reports.

Raw WebPageTest runs - Round 2 (January, 2017)

Ember production sites, Moto G4, 3G: https://www.webpagetest.org/result/170122_XX_TQD/ with runtime call stats
Ember production sites, desktop, cable: https://www.webpagetest.org/result/170122_RM_3GW/ with runtime call stats
Sites from the React Wiki, Moto G4, 3G: https://www.webpagetest.org/result/161218_05_c8320bc70144d48df218f230701ec2b8/
Sites from the React Wiki, Desktop, cable: https://www.webpagetest.org/result/170113_XJ_17TN/ (with runtime call stats)
Sites from the React Wiki, Nexus 5: https://www.webpagetest.org/result/170116_Z6_9GS/
Sites from the React Wiki, iPhone 6, cable: https://www.webpagetest.org/result/170116_MV_8VQ/
Libscore React apps, Moto G4, 3G: https://www.webpagetest.org/result/161218_GD_297def2a6c8e5bcf355566577d4ed7b9/
Libscore React apps, Desktop, cable: https://www.webpagetest.org/result/170114_TD_V4Z/
React + Webpack usage survey sites, Moto G4, 3G: https://www.webpagetest.org/result/170115_HM_EYT/
React + Webpack usage survey sites, Desktop, cable: https://www.webpagetest.org/result/170114_WN_V01/
Subset of Libscore Angular 1.0 apps, G4, 3G: https://www.webpagetest.org/result/170115_SZ_Z5N/
Subset of Libscore Angular 1.0 apps, desktop, cable, runtime call stats: https://www.webpagetest.org/result/170114_25_1DC/
Libscore jQuery apps, G4, 3G: https://www.webpagetest.org/result/161218_A4_20acbeb009bd0a44527d3e4d93f1de2a/
Vue.js production apps, G4, 3G: https://www.webpagetest.org/result/170115_HE_10EN/
Vue.js production apps, desktop, cable: https://www.webpagetest.org/result/170115_KH_10F5/ (note: the Vue sets are based on less complex apps than the other sets appear to contain, but this is a personal observation. <3 Vue otherwise)
Polymer apps in production, G4, 3G: https://www.webpagetest.org/result/170120_59_40GD/
Polymer apps in production, desktop, cable: https://www.webpagetest.org/result/170120_W1_40PP/
Polymer (based on custom list they provided) Moto G4, 3G: https://www.webpagetest.org/result/170125_GX_G9YF/
Mobile Top 10 Moto G4, 3G: https://www.webpagetest.org/result/161218_TD_2fcc131f975da7e19ddb750f4991421d/
Mobile Top 10 Desktop, Cable (with V8 runtime call stats in traces): https://www.webpagetest.org/result/170114_VE_TSK/
Production Progressive Web Apps, Moto G4, 3G: https://www.webpagetest.org/result/161218_DW_b3e87329f9a5029d63b67759054fb0ce/
Production Progressive Web Apps, Desktop, Cable: https://www.webpagetest.org/result/170114_CE_TY5/
Chrome Loading team top sites, Moto G4, 3G: https://www.webpagetest.org/result/161218_RS_4be952cee9b8bede5ea16c26594ac38a/
Chrome Loading team top sites, Desktop, cable: https://www.webpagetest.org/result/170114_4R_TT6/
V8 Team Top 25, Moto G4, 3G: https://www.webpagetest.org/result/161218_Q6_b3202df8ed80da21ec73d9afcedead76/
V8 Team Top 25, Desktop, Cable: https://www.webpagetest.org/result/170114_Y9_TW6/

Raw WebPageTest runs - Round 1 - Older study (December, 2016)

I put together this graphic when internally sharing the first version of this study. I decided to redo it as at least the network throttling setup from this study wasn't the same between the 2-3 web perf tooling systems used. This meant that while overall weight, time in script (parse/eval), FMP and load time were fine, the TTI numbers could not be concretely confirmed as 100% accurate. Instead, I redid the study once we added support for TTI to WebPageTest and I'd trust the numbers there (Round 2) a lot more.

Summary insights + further data from this study: https://docs.google.com/a/google.com/spreadsheets/d/1QRdYgGVdDYB7kgfFKg0-0ViUonpvt8SXxcvv0gXKsEg/edit?usp=sharing
React prod sites from their wiki - Moto G - https://www.webpagetest.org/result/161210_WK_aa40594011795974a4e2d26a24e0c3b6/
React prod sites from their wiki - iPhone - https://www.webpagetest.org/result/161210_M1_7d32c93c8dcfaaa46cef42125aec75ba/
Ember production sites - iPhone - https://www.webpagetest.org/result/161211_56_a8275acd05ef23932e85e1d83ceb8787/
Ember production sites - Moto G - https://www.webpagetest.org/result/161211_PJ_d46293b5026e8c3ca7fa85d680cf2b0d/
Loading team top sites - Moto G - https://www.webpagetest.org/result/161210_PC_82eaf67c4d57a8dbc533ad9c4a1ee08c/
Loading team top sites - iPhone - https://www.webpagetest.org/result/161210_B7_9b36e27c2eaa0cec9cd5f8353a64acaa/
Global top 10 mobile sites - iPhone - https://www.webpagetest.org/result/161210_93_ec315ea70330ecf03041396f21f5098f/
Global top 10 mobile sites - Moto G - https://www.webpagetest.org/result/161210_WD_e2f65275bd6b3950dd2b367195e4d395/
V8 top sites - Moto G - https://www.webpagetest.org/result/161210_MB_36ccea8a66638d08c5ea2855a04ca023/
V8 top sites - iPhone - https://www.webpagetest.org/result/161210_37_6bca043df3316fb6e55efdba67d4b7bf/
Speedometer 2 alpha - Moto G - https://www.webpagetest.org/result/161208_ET_6f0c253c8b2a92a74c8989b7fbdb8c9a/
Catapult tool - Libscore.io React apps - Nexus 5 - https://docs.google.com/spreadsheets/d/1C4mO62NjVSVOVTsezlPwyY_pMIDV5b0401arE9GNviA/edit?usp=sharing
Catapult tool - BuiltWith Angular - Nexus 5 - https://docs.google.com/spreadsheets/d/12Wb7fb15Kq9XeoAqKneGx12oD8dshe8MfCAf46lG1qQ/edit?usp=sharing
BigQuery - angular.module() usage - https://docs.google.com/spreadsheets/d/1xdOr1cV9JBPyqlsCcATZI-cToUh7AC3Zhp1KP1YZzdk/edit?usp=sharing
Catapult tool - Ember apps - Nexus 5 - https://docs.google.com/spreadsheets/d/1RwjhNnUQRkUXZXcnqj3zHISQFuiQnqhbH3qUc6owpHg/edit?usp=sharing
Catapult tool - Libscore jQuery apps - Nexus 5- https://docs.google.com/spreadsheets/d/10vHRVUF7WoFubki_wKCFKXvWILxva8gGbzomC-8-CqQ/edit?usp=sharing
Catapult tool - BigQuery - Sites using mod_pagespeed - Nexus 5 - https://docs.google.com/spreadsheets/d/1XpTMwpOkhrnpNLVfQGETqVK3TWVxIzrnGq5i1uqitb4/edit?usp=sharing
WebPageTest - Libscore - Angular (2000 apps): https://docs.google.com/spreadsheets/d/1OBvcDSyoKHTjdH2CALui88orDF_n-0Yi92J93LDpUjQ/edit#gid=646387798&vpid=AD1 (Median insights ~ compile 445ms, eval 1784ms, FMP 7.2s, SI: 9s. Most apps spent 4.6s in JS on startup)
WebPageTest - Libscore - Ember (695 apps): https://docs.google.com/spreadsheets/d/1m9eFna_Xg70c_MPG4tU9D4fOdJT9SZQ2u5fQCxEO--g/edit#gid=1536554622&vpid=AG649 (Median insights ~ compile 374ms, eval 2.7s, SI: 10s, FMP: 9s, load: 13.8s, most apps spent 6s in JS on startup)
WebPageTest - Libscore - React (2000 apps): https://docs.google.com/spreadsheets/d/1HliA4jUMfqd4vFLDkYoTZQRv1oM1rM_8ahBAsXYfuAE/edit#gid=1982534624&vpid=A1 (Median insights ~ compile 481ms, eval 2609ms, load: 15.5s, FMP 6.5s, SI: 8s. Most apps spent 4.5s in script on startup)

Other data sets generated (Dec, 2016)

Note: many of the below data sets were generated before we installed Moto G4s in WebPageTest and had to use the Moto G1 instead. Some of the data sets will also be using earlier versions of the time-to-interactive metric and should not be directly compared in most cases to the latest data from 2017. This is historical data that's interesting and may be worth reexploring where particular data-sets didn't end up making it to the final study results.

React production sites from their wiki - moto G1, iPhone 5c
WebPageTest > BigQuery > Webpack built apps - limited to the first 2K of them.
JS Framework benchmark modified to run automatically and deployed on https://v8-parse-eval-study.firebaseapp.com/. WebPageTest results over a Moto G, 3G.
V8's Top 25 sites tested on a Moto G and iPhone 5
Global mobile top 10 sites tested on a Moto G and iPhone 5
Sites using mod_pagespeed in the wild - data from Catapult tests - TTI
BuiltWithAngular apps - Catapult tests - TTI - Nexus 5
Apps built using jQuery - Catapult - TTI. I decided not to heavily use this set too much. jQuery could be getting pulled in as a transitive dependency for ads.
Libscore > React apps - Catapult - TTI - Nexus 5X
Chrome's loading team top sites on a Moto G vs on an iPhone
WebPageTest > BigQuery > angular.module() usage (I decided this set was too noisy)
Libscore Angular 1.0 apps

Answer 1 · 2017-02-10T19:46:00.000Z

@addyosmani nice work. Could you explain what Sets not reliable were discarded means? Are those issues where TTI is infinity and isn't reported or just sets with a high variance?

I requested access to the page sets document, not sure it is intentionally private.

Answer 2 · 2017-02-12T23:07:02.000Z

nice work.

Thanks!

Could you explain what Sets not reliable were discarded means? Are those issues where TTI is infinity and isn't reported or just sets with a high variance?

High variance was indeed a problem with certain URLs. One of the challenges with studying web performance at scale is data sets are susceptible to varying quantities of noise. Some of the sets I originally used (prior to filtering) were only using a framework through a transitive dependency.

One example of this were sites that pulled in all of Angular for just an ad, and so, even if the page would have otherwise had a decent TTI, their third-party includes were pushing their TTIs out very heavily. Some of the other data sets I used had URLs that suffered from the same problem so I manually removed them after some manual tracing. This study was mostly looking at pages using a framework for their core content.

Are those issues where TTI is infinity and isn't reported or just sets with a high variance?

The current TTI metric we've implemented in Lighthouse and WebPageTest will very occasionally return infinity for URLs (especially if they keep the main thread busy for a long time). I locally filtered out -1s/infinity values when looking at medians to account for this. My hope is eventually TTI will be reliable enough for filtering like this to not be required.