Implement CMS-focused custom metric for whether a site uses a WordPress block theme
felixarntz opened this issue · 1 comments
Per the HTTP Archive discussion on Slack: We want to find out how commonly the so-called "block themes" are used across WordPress sites. A site can be identified as using a block theme if it includes a <div class="wp-site-blocks">
element. Additionally, only WordPress sites of version 5.9 or higher are technically able to use block themes.
I originally wrote a query for this in GoogleChromeLabs/wpp-research#32 (consuming 65TB 🤯). @rviscomi shared with me a more efficient version of that query, which I've pasted below. However, using DOM APIs over a regular expression would be more straightforward, so a custom metric for this would be valuable.
For reference, here's the alternative query created by @rviscomi:
WITH wordpress AS (
SELECT DISTINCT
client,
page
FROM
`httparchive.all.pages`,
UNNEST(technologies) AS t,
t.info AS version
WHERE
date = '2022-10-01' AND
is_root_page AND
t.technology = 'WordPress' AND
(version = '' OR CAST(REGEXP_EXTRACT(version, r'^(\d+\.\d+)') AS FLOAT64) >= 5.9)
),
block_themes AS (
SELECT
client,
page
FROM
`httparchive.all.requests`
WHERE
date = '2022-10-01' AND
is_root_page AND
is_main_document AND
REGEXP_CONTAINS(response_body, r'<div class="wp-site-blocks">')
)
SELECT
client,
COUNT(0) AS pages
FROM
wordpress
JOIN
block_themes
USING
(client, page)
GROUP BY
client
Fixed via #62.