Validate `meta[http-equiv]`
rviscomi opened this issue · 0 comments
In general, the WHATWG supports a limited set of keywords that are valid attribute values for http-equiv
:
content-language
content-type
default-style
refresh
set-cookie
x-ua-compatible
content-security-policy
Notable omissions include:
origin-trial
etag
x-*
(besidesx-ua-compatible
cache-control
expires
pragma
accept-ch
content-style-type
content-script-type
These are all http-equiv
attribute values used by over 100k pages, according to HTTP Archive, in descending order of popularity.
See the full results and query, if interested
http_equiv | pages |
---|---|
x-ua-compatible | 5,849,869 |
content-type | 4,064,550 |
origin-trial | 3,741,447 |
etag | 432,755 |
x-wix-published-version | 432,595 |
x-wix-application-instance-id | 432,594 |
x-wix-meta-site-id | 432,593 |
content-language | 430,009 |
cache-control | 351,196 |
expires | 301,664 |
pragma | 296,342 |
accept-ch | 232,735 |
x-dns-prefetch-control | 176,712 |
content-style-type | 172,497 |
content-script-type | 136,871 |
imagetoolbar | 97,656 |
cleartype | 93,802 |
content-security-policy | 82,064 |
refresh | 28,243 |
keywords | 28,027 |
last-modified | 14,478 |
x-xrds-location | 13,495 |
page-enter | 11,945 |
encoding | 10,936 |
description | 10,716 |
x-rim-auto-match | 10,361 |
msthemecompatible | 9,564 |
reply-to | 9,113 |
language | 8,653 |
content-location | 6,896 |
copyright | 6,435 |
x-frame-options | 6,323 |
window-target | 4,930 |
title | 4,601 |
x-ua-compatiable | 4,493 |
page-exit | 4,468 |
pics-label | 3,269 |
screenorientation | 3,105 |
audience | 2,378 |
author | 2,140 |
access-control-allow-origin | 2,072 |
dc.description | 1,836 |
cache | 1,759 |
robots | 1,501 |
distribution | 1,464 |
vary | 1,386 |
x-webkit-csp | 1,376 |
p3p | 1,258 |
revisit-after | 1,226 |
default-style | 1,054 |
Query:
WITH meta AS (
SELECT
page,
LOWER(JSON_VALUE(meta, '$.http-equiv')) AS http_equiv
FROM
`httparchive.all.pages`,
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01' AND
client = 'mobile' AND
is_root_page
)
SELECT
http_equiv,
COUNT(DISTINCT page) AS pages
FROM
meta
WHERE
http_equiv IS NOT NULL
GROUP BY
http_equiv
ORDER BY
pages DESC
The biggest one that jumps out to me is origin-trial
, which is used on ~375k pages. Given that it is explicitly supported and endorsed by Chrome, Edge, and Firefox (Safari doesn't support origin trials) I've left a comment on the WHATWG issue recommending its standardization.
I don't think capo.js should complain about spec validity for these keywords as long as browsers support them. But there are some specific usages worth validating.
http-equiv=content-type
According to the W3C spec, the content
attribute of a meta[http-equiv=content-type]
tag must be set to a "specially formatted string providing a character encoding name... in exactly the following order":
- The literal string
"text/html;"
. - Optionally, one or more space characters.
- The literal string
"charset="
. - One of the following:
- For documents in the HTML syntax: A character encoding name.
- For documents in the XML syntax: Any case-insensitive match for the string
"UTF-8"
.
The WHATWG further requires that the character encoding name be exactly utf-8
and that:
A document must not contain both a meta element with an http-equiv attribute in the Encoding declaration state and a meta element with the charset attribute present.
capo.js should validate that HTML5 pages set a charset to utf-8
and don't have redundant meta tags. Not sure about HTTP header vs meta tag redundancy, but that's also worth exploring (related #59).