Seeing some big changes in requests for June
tunetheweb opened this issue · 14 comments
I wonder if this is related to a dip in the number of requests per crawl according to the meta dashboard
Seems likely. Weirdly summary_requests is down too even though summary_pages and pages look fine.
Here's the diffs by type: https://docs.google.com/spreadsheets/d/1VzJ-xSslN9mzvy_ZIbVPnp_KWEbvjCioIrP-f1ohy5g/edit?usp=sharing
| type | 2022_05_01_mobile | 2022_06_01_mobile | % change |
|---|---|---|---|
| audio | 732,146 | 298,054 | 40.71% |
| css | 78,447,579 | 75,867,521 | 96.71% |
| font | 31,810,867 | 30,518,284 | 95.94% |
| html | 52,966,095 | 26,045,161 | 49.17% |
| image | 263,199,684 | 215,902,188 | 82.03% |
| other | 39,679,449 | 14,246,564 | 35.90% |
| script | 224,872,855 | 168,722,090 | 75.03% |
| text | 14,109,747 | 10,255,042 | 72.68% |
| video | 2,001,606 | 799,350 | 39.94% |
| xml | 752,318 | 437,492 | 58.15% |
| Grand Total | 708,572,346 | 543,091,746 | 76.65% |
Hm the almanac tables I'm generating now depend on this data so I worry that we might not be working with a complete dataset. I'll spot check a few sample URLs to see if anything is getting lost.
@pmeenan it looks like WPT might be ending the tests too early. Here's a comparison of cnn.com in May and June. The May test has 634 requests. The June test has only 25.
I can reproduce the early exit behavior when running an ad hoc test from the private instance. Running a clean test from the public instance behaves as expected.
If I had to guess, some of the new custom metrics may be interfering with the tests.
Here's a list of custom metrics that were added/modified since the May crawl:
- 00_reset
- javascript
- markup
- observers
- performance
- privacy
- responsive_images
robots_metasecurityvalid-headwell-knownwpt_bodies
(crossing out any that have been successfully tested on the public instance and ruled out)
Probably worth temporarily disabling the injection script and see if that fixes it. I can rename the script and reboot that agent for the private instance which will force it to update.
I renamed observers.js to observers.js.bak and here is the re-run (looks like that was it).
We should be able to take the contents of the injected script and use it in the private instance's UI to whittle down the problem areas but I wouldn't be surprised if the basic observer injection itself was having some bad side-effects with react and some frameworks and I'm not sure we'll ever be comfortable that there isn't some sites we break if we fix some large cases and make the remaining issues hard to see.
Probably a much smaller impact but Chrome 102 also rolled out right before the crawl started and we're seeing some cases of the netlogs not being captured (may only be when early-hints are force-enabled). I'm actively working on re-writing how WPT captures and processes netlogs from Chrome and should have that ready in a couple of days. This is most likely a small impact for anyone using early-hints origin trial and maybe some cases of missing OCSP requests and other things only available through the netlog.
Injected scripts are the only ones that are run before the page is tested so I wouldn't expect other metrics to impact the actual testing (though they may introduce side-effects amongst each other).
Yeah I agree it looks like the injected script.
Seeing this error when manually executing the script early in the page load via DevTools on cnn.com:
Error loading React component TypeError: Cannot set property toString of [object Object] which has only a getter
The code related to the exception looks like this:
Object.defineProperty(b, "defaultProps", {
get: function() {
return this._foldedDefaultProps
},
set: function(t) {
this._foldedDefaultProps = r ? p(e.defaultProps, t) : t
}
}),
b.toString = function() {
return "." + b.styledComponentId
}You also specifically told me to test it on cnn.com in the PR, which I did. I only looked at the custom metric output, which appeared to work, and I didn't notice the waterfall having only 25 requests 😞
HTTPArchive/custom-metrics#14 (comment)
https://webpagetest.httparchive.org/result/220524_GJ_1/1/details/#waterfall_view_step1
We're giving new meaning to "breaking news" 🥲
I think some of the existing custom metrics were also changed to use observers so it's not quite as easy as just disabling the observers (I could be mis-remembering though).
The June rerun is complete and we're just waiting on the data pipeline to write everything to BQ. Once it's there we can regenerate the reports and validate that the request counts are all at expected levels.
Ran a couple of validation queries to make sure that the June rerun contains comparable results as previous months.
#standardSQL
SELECT
IF(ENDS_WITH(_TABLE_SUFFIX, 'desktop'), 'desktop', 'mobile') AS client,
ROUND(APPROX_QUANTILES(reqTotal, 1001)[OFFSET(101)], 2) AS p10,
ROUND(APPROX_QUANTILES(reqTotal, 1001)[OFFSET(251)], 2) AS p25,
ROUND(APPROX_QUANTILES(reqTotal, 1001)[OFFSET(501)], 2) AS p50,
ROUND(APPROX_QUANTILES(reqTotal, 1001)[OFFSET(751)], 2) AS p75,
ROUND(APPROX_QUANTILES(reqTotal, 1001)[OFFSET(901)], 2) AS p90
FROM
`httparchive.summary_pages.2022_06_01_*`
WHERE
reqTotal > 0
GROUP BY
client
ORDER BY
clientThe medians are 76 for desktop and 70 for mobile, aligning with May results for reqTotal.
#standardSQL
SELECT
IF(ENDS_WITH(_TABLE_SUFFIX, 'desktop'), 'desktop', 'mobile') AS client,
ROUND(APPROX_QUANTILES(bytesTotal, 1001)[OFFSET(101)] / 1024, 2) AS p10,
ROUND(APPROX_QUANTILES(bytesTotal, 1001)[OFFSET(251)] / 1024, 2) AS p25,
ROUND(APPROX_QUANTILES(bytesTotal, 1001)[OFFSET(501)] / 1024, 2) AS p50,
ROUND(APPROX_QUANTILES(bytesTotal, 1001)[OFFSET(751)] / 1024, 2) AS p75,
ROUND(APPROX_QUANTILES(bytesTotal, 1001)[OFFSET(901)] / 1024, 2) AS p90
FROM
`httparchive.summary_pages.2022_06_01_*`
WHERE
bytesTotal > 0
GROUP BY
client
ORDER BY
clientMedian bytesTotal for desktop is 2317 and 2022 for mobile.
Both of these results are in line with what I'd expect MoM, so I think we're almost ready to close this out. Remaining work is to take the 2022_06_09 tables and alias the home page data to 2022_06_01 (already done for summary_pages for the queries here).
- summary_pages desktop
- summary_pages mobile
- summary_requests desktop
- summary_requests mobile
- pages desktop
- pages mobile
- requests desktop
- requests mobile
- lighthouse desktop
- lighthouse mobile
- technologies desktop
- technologies mobile
- response_bodies desktop
- response_bodies mobile


