Investigate differences between lighthouse scores from lighthouse in Chrome vs lighthouse-parade

Question

Investigate differences between lighthouse scores from lighthouse in Chrome vs lighthouse-parade

Closed this issue 4 years ago · 3 comments

We have noticed that lighthouse-parade tends to result in a higher lighthouse scores than lighthouse in Chrome. Typically the difference is ~6-8 pts.

We are going to investigate potential differences that could be causing the discrepancy in scores:

Lighthouse 6.2.0 (in Chrome) vs Lighthouse 6.4.0 (in lighthouse-parade)
Headless vs Headed
Chromium vs Chrome
Chrome extensions vs No Chrome Extensions
Potential differences in mobile emulation: Is the --emulated-form-factor=mobile flag the same as the "Device: Mobile" option?
Potential differences due to CSV output vs what is displayed in the lighthouse UI

calebeby commented 4 years ago

🎉

Answer 1 · 2020-11-19T04:32:35.000Z

@emersonthis I did some testing in different chrome/lighthouse versions: https://docs.google.com/spreadsheets/d/1_6vUx34Lzmz58m3FGANWk90QjZ3oG9W87DLBVw9lJus/edit#gid=0

Conclusions:

Chrome extensions have a significant negative effect on LH score
There doesn't appear to be a significant difference in perf scores between LH 6.2.0 and 6.4.0
There doesn't appear to be a significant difference in perf scores between lighthouse in chrome and lighthouse-parade, if incognito mode is used.
It is interesting that lighthouse-parade has the most bell-curve-like shape. It could just be coincidental, but maybe it is because a fresh browser instance is used every time

I only tested on https://www.zachleat.com/. Let me know if you think it's worth testing other sites too

Answer 2 · 2020-11-19T18:16:06.000Z

Chrome extensions have a significant negative effect on LH score

This is great to know! We think about ways to document this.

It is interesting that lighthouse-parade has the most bell-curve-like shape. It could just be coincidental, but maybe it is because a fresh browser instance is used every time

It's hard to tell with this sample size, right? It seems like what we've established is that LP is delivering the purest results. And this is a good reason for there to be discrepancy between other scores. If we feel confident about this, I don't think there's much need to study exactly why the others are less consistent. (Although outside the context of this project, that could make for an interesting article to write some day. You've already done a chunk of the research. Something like: "Why Your Lighthouse Scores are Wrong")

I'm satisfied enough to close this if you are.