Use robots.txt to reduce search engine traffic to older documentation.

Question

Use robots.txt to reduce search engine traffic to older documentation.

wmwolf opened this issue 2 months ago · 3 comments

We should use robots.txt to get search engines to guide users preferentially to the latest version of our documentation. Within the website, old versions will be unchanged, but web crawlers will be asked not to index those pages. I think it could be as simple as

User-agent: *
Disallow: /
Allow: /en/latest

We'd just need to get a file called robots.txt at the root directory of docs.mesastar.org with these contents (I think), and with time, the web crawlers should update their indexes.

Answer 1 · 2024-07-25T18:04:26.000Z

I faintly remembered that there was some way to refer indexers to the latest version of a page. I did some searching, and it seems that the canonical link annotation can do this. From what I can tell this would be more effective, as robots.txt in the first place prevents crawling but not indexing. Meaning if someone links to an older version somewhere, these pages could still pop up. Although that's what I found from reading some stackexchange posts, so I don't know how correct this is.

Answer 2 · 2024-07-25T19:10:31.000Z

Looks like readthedocs autogenerates and serves a robots.txt files: https://docs.readthedocs.io/en/stable/guides/technical-docs-seo-guide.html#use-a-robots-txt-file

but it might not be what we want: https://docs.readthedocs.io/robots.txt

Answer 3 · 2024-07-25T19:50:16.000Z

I took @wmwolf 's suggestion and added a robots.txt file here: #694