Incomplete HTML response bodies
rviscomi opened this issue · 2 comments
Noticed an unusually high number of pages without a <body> tag in the HTML response body. Here's a query to take a sample:
WITH req AS (
SELECT
page,
response_body
FROM
`httparchive.all.requests` TABLESAMPLE SYSTEM (0.10 PERCENT)
WHERE
date = '2023-05-01' AND
client = 'mobile' AND
is_main_document AND
is_root_page AND
REGEXP_EXTRACT(response_body, r'(?i)(.*)<body') IS NULL
LIMIT 10
),
pages AS (
SELECT
page,
wptid
FROM
`httparchive.all.pages`
WHERE
date = '2023-05-01' AND
client = 'mobile' AND
is_root_page
)
SELECT
page,
wptid,
response_body
FROM
req
JOIN
pages
USING
(page)For example, here's one WPT response body and you can see it cut off at mid-CSS: https://webpagetest.httparchive.org/response_body.php?test=230509_Mx1ZG_FBEG4&run=1&bodyid=741FA7D767C970D82CCE5621C0B68519
.et_animated.slideLeft{-webkit-animation-name:et_pb_slideLeft;animation-name:et_pb_slideLeft}@-webkit-keyframes et_pb_bounce{0%,20%,40%,60%,80%,to{-webkit-animation-timing-function:cubic-bezier(.215,.61,.355
The page seems to render fine in the test, as the filmstrip shows visual content and the waterfall is full of requests. Viewing source on the live page also shows complete HTML.
So maybe there's something in WPT or the HA pipeline that's cutting off the response body?
I'll look closer during the week in case there's something going on with the netlog body streaming but, just in case it's an issue, I just doubled the size of the agent SSD's from 10GB to 20GB (and got approval for the quota).
The test agent we use for manual tests had run out of disk space and was failing so it is running with a lot more space than the regular agents but it is also persistent for months while the agents are just alive for a day or so (and I'd expect much worse side-effects than some truncated bodies) but it doesn't hurt to have a bit more breathing room on the disks.
Besides the disks fix, I suppose the data was lost. So there is nothing else to do.
Closing.