Add scripts/CI to check for `404` URLs after a big move
Opened this issue · 1 comments
mrjones-plip commented
Hugo does a great job ensuring all pages that link to each other internally don't 404
. However, for large moves like we did recently with forms, we may 404
a number of inbound links from other sources, or bookmarks folks have. To ensure these don't break, it's nice to generate a list of all known URLs on main
, do a big move, and then check that all the known URLs safely redirect.
Two scripts were written already which we may choose to repurpose - but likely this should be:
- rewritten in node
- run in CI and block a merge if it fails
- allow users to run locally so they don't have to wait for CI
mrjones-plip commented
Ok! I did some exploratory research and here's what I think the rough structure is - open to input though! For every PR that wants to merge to main, CI will:
- build a version of the site based off the branch - see how we do this already for a weekly link check
- get every current URL by downloading the site map from production
- using
curl
for npm - download every page on the branch build running in the CIhugo
server - check the response and HTML for each:
200
response - if yes, check if it has ahttp-equiv="refresh"
in the HTML and that this in turn has a200
(recursive 'til no meta refresh?)404
response - note the page has a404
and should be instead have an alias (meta refresh)
the site map saves us quite a bit of recursion!