Recommend a redirect strategy for docs
dedemorton opened this issue · 16 comments
We will be refactoring a lot of content in the coming months (beats, cloud, etc). Right now, our strategy for handling moved content (changed URLs) is not clear.
In the past, we've requested that the website team create redirects. Here's a random example of a request.
We also have a legacy redirects page that we planned to use in the future to manage redirects, but I don't think that page is being updated, and I don't think it's actually used by the build.
Some teams maintain a Deleted pages appendix and use that to redirect users manually to a page that's moved.
We need a clear strategy going forward, and I'm not sure whether redirects are the right way to go.
According to our internal wiki (copied from there):
- Web team says it's best practice to update links on the site that reference the updated URL - search engines will see the old link if we don't update it, and it's not best practice to rely on the redirects.
- From an SEO perspective, it's best to replace URLs where possible / worth the effort since the positive impact to SEO is diminished if the crawlers see the old URL first. When crawlers hit the old URL, this will pass along some information about the new URL but is read as coming from a second-hand source. It's better overall for SEO if the crawler hits the live page first instead of coming to the live page from a redirect.
- Another consideration for the Web team is the monitoring and maintenance for redirects - after too many redirects (more than 4?), the page will not load due to too many redirects (the redirect gets stuck in a loop) - at this point, the page will stop being indexed. If we replace the URLs instead of redirecting, we're saving the team from having to monitor / remember some of these.
@dedemorton thank you for raising this issue. This is going to impact a major doc refactoring that is currently ongoing on the Cloud ECE docs.
Also see the related issue: #1357
Yes, this is a major issue for restructuring content. The current process (request redirects from the web team) has led to pain because we have no insight into what redirects already exist. We've ended up with circular redirects and assorted broken-ness. The redirects file in the docs repo was Nik's attempt at bringing some order to the chaos, but was never adopted by the web team & the folks at RAW who handle the actual infra/deployment of the site.
Website redirects also only work at the page level, so if the chunking of content changes, they don't really solve the problem.
The redirects appendices feel like a very hacky solution--but it's one we can control pretty easily. On the ES side, @jrodewig and I discussed a strategy where we would clean up old entries when a new version was released, rather than keeping them around indefinitely. (Knowing that there are likely to be a number of necessary redirects between the last minor and a new major, so it's not just a matter of deleting all of them and starting over.)
One motivation for the redirect appendices was to minimize surprise cross doc links that caused chaos on release days. Now that we have the CI checks in place and have improved the process around releases, it's probably worth enforcing that if you move/remove a topic and break a link from somewhere else in the docs, you need to fix it, not just add an entry to the redirects appendix. The redirects appendix should be used to keep external links (like Google search results) from 404-ing.
Ideally, it's best to mark pages that are being removed with a noindex tag and request a reindex from Google before they disappear. I think that we could basically accomplish that by adding a noindex tag to the redirect appendices and requesting a reindex when we roll out big changes or before we clean up old entries.
@AnneB-SEO might have other insight into how to minimize the SEO disruption as we reorg the docs.
Also, simply keeping track of everything that has moved around is a chore. It would be really helpful to have a new & deleted anchor report generated for each PR.
Also, simply keeping track of everything that has moved around is a chore. It would be really helpful to have a new & deleted anchor report generated for each PR.
This is exactly the type of feedback which helps me prioritize what I'm working on!
I hate to link to a private Slack conversation in a public repository, but I think it's necessary to illustrate how big of a problem this is: Slack 🧵. We should remember to clean up the broken links that already exist with whatever solution we choose to move forward with.
The redirects appendix should be used to keep external links (like Google search results) from 404-ing.
From an SEO perspective, a manual redirect done using a redirect appendix is inferior to a server-side 301 or 302 redirect.
Those manual redirect pages still respond with a 200 HTTP status code, which indicates to search engines that the old page is still alive. This means our new page is competing with the old (redirect) page. As older pages typically have more link juice, the redirect pages may be returned in SERPs before actual content pages.
The best case is a 301/302 that passes that link juice on to the new page. However, even a 404 would at least let the old page die. Right now, the redirect appendices are keeping old, zombie pages alive.
I also don't think the redirect appendix is the best experience for users.
I would love to abolish redirect appendices entirely, except maybe in cases where there is no good redirect. Better control and visibility of server-side redirects would be my preferred path forward.
Agreed. The redirect appendices are a patch for a broken process. Beyond the issue of ending up with zombie pages that never go away, the manual process simply doesn't scale for major reorganization of existing content. We need to be able to automatically detect changes that require redirects, and manage the redirects in a way that doesn't require multiple spreadsheets and teams.
After spending a several hours today updating links throughout all the docs, I had a thought about how we can approach linking with the tools and processes that we have now. My solution isn't ideal. Our tools should really maintain the link and link text for us. But having to manually go through a dozen repos to update links (even for a handful of topics) is a major PITA.
What if we create one or more shared link files in the docs
repo. Each team would externalize links for the rest of the team to use. We can anticipate a lot of the links, then add more when people need to add links.
So we might have something like:
links.asciidoc
(or maybe beats-links.asciidoc
) that contains attributes like:
:metricbeat-quick-start-link: {metricbeat-ref}/metricbeat-installation-configuration.html[{metricbeat} quick start]
Writers could use {metricbeat-quick-start-link}
instead of hard coding all the links in their books.
If we want to provide writers with more control over the link text, we could use two attributes:
:metricbeat-quick-start-link: {metricbeat-ref}/metricbeat-installation-configuration.html
:metricbeat-quick-start-text: {metricbeat} quick start
Then writers would resolve the link by using:
{metricbeat-quick-start-link}[metricbeat-quick-start-text]
I know this is hacky, but I've had a long day of monkey work and feel like I'm stuck in 1985. (I guess I don't have to use carbon paper or leave enough space for footnotes, but seriously, all this manual monkey work is a time sink.)
Hmm...but then we'd also need some kind of versioning, maybe similar to what we do for the versions file?
@dedemorton 💯 for this approach for any links that are used more than 2 or 3 times in a book. It's obviously not a complete solution, but hopefully reduces some of the pain.
If you're trying to link to the same version, could you just throw a {branch}
in the URL?
Hi @jrodewig - missed this post from a few months ago - so sorry. Great summary! Adding a couple notes.
From an SEO perspective, a manual redirect done using a redirect appendix is inferior to a server-side 301 or 302 redirect.
Yes and no. A server-side redirect is always preferred yet only a 301. 302's still can be a bit problematic to search engines and are really for temporary redirects, such as a login URL that performs language detection before assigning a destination URL.
Those manual redirect pages still respond with a 200 HTTP status code, which indicates to search engines that the old page is still alive. This means our new page is competing with the old (redirect) page. As older pages typically have more link juice, the redirect pages may be returned in SERPs before actual content pages.
Finally an explanation of how docs generates all those "soft 404s" (a 404 that returns a 200). Thank you!
The best case is a 301/302 that passes that link juice on to the new page. However, even a 404 would at least let the old page die. Right now, the redirect appendices are keeping old, zombie pages alive.
Link juice will only get passed with a 301. Even if we were to redirect with a 302 and then change to a 301 all the link authority would be lost.
I also don't think the redirect appendix is the best experience for users.
Sounds like it's a poor experience for both users and search engines!
I would love to abolish redirect appendices entirely, except maybe in cases where there is no good redirect. Better control and visibility of server-side redirects would be my preferred path forward.
YES!
Thanks again for the write up and background on the soft 404s!
@gtback RE your comment:
If you're trying to link to the same version, could you just throw a {branch} in the URL?
The {metricbeat-ref} attribute would take care of resolving the correct branch. I'm thinking more about the situation where we change the HTML filename (maybe to improve SEO) but the change only applies to a specific version and later. The lack of branches in the docs
repo makes it hard to version attributes that might change over time. The way we handle versions of shared attributes right now is a little hacky, so I think this needs a bit more thought, especially as we're on the cusp of some big refactoring.
@lcawl RE your comment:
Should we consider using external links at all times or should we use citation maps for all links in each book/context (and define the URL attribute and whether it is an external or internal link appropriately for each book)
I wouldn't want to use external links everywhere because we'd lose out on link validation in local builds and that would make it harder to diagnose some build problems before we push to GitHub. Plus we'd have to maintain all the link text manually.
Hmmm...it would be cool if we could somehow harness the logic that asciidoctor uses when it creates links and use it to generate a file that's populated with external links that other books can use. I guess we'd need logic so that once an attribute is defined in the link file, only the filename and link text would get updated. (Just trying to think of ways to automate the creation and maintenance of this file so that it doesn't become yet another time sink.)
EDITED: As a first step, we could manually create files that capture the high traffic links (like getting started and installation topics).
Better control and visibility of server-side redirects
💯 for this. I'm looking forward to hosting the docs ourselves, and a big part is exactly for that reason.
The lack of branches in the
docs
repo makes it hard to version attributes that might change over time
@dedemorton That makes sense, thanks. I'll have to think more about it. @benskelker was asking me a similar question this morning.
Would be nice to get this fixed for Next Docs, but probably not worth changing the process in the current doc system...so I'm closing.