facebook/docusaurus

Fix trailing slash issues (depend on hosting provider)

slorber opened this issue ยท 34 comments

Edit: as part of the analysis to solve this issue, I'm writing a guide to describe factually the behavior of various static site generators and static hosting providers: https://github.com/slorber/trailing-slash-guide

๐Ÿ’ฅ Proposal

Fix trailing host slash issues: write /myDoc.html instead of /myDoc/index.html

We have a few issues that are related to docs trailing slashes: #2654, #2394, #2134, #3351, #3380

I suspect these issues are related to the Docusaurus output, as for a doc at /docs/myDoc we write /myDoc/index.html instead of /myDoc.html

Always creating a subfolder is probably a signal for the hosting providers (Netlify, Github Pages...) to add a trailing slash automatically to URLs, which can be annoying.

It can break relative links as now the link resolution is different due to the trailing slash, and we end up with broken links in production, not catched at build time (see #3380)

This behavior is provided by a dependency: https://github.com/markdalgleish/static-site-generator-webpack-plugin/blob/master/index.js#L165

We already run on a fork of this dependency. I guess we could as well include a way to write /myDoc.html directly if a route path does not end with a trailing slash.

As this is a risky change, I suggest making this optional. We can make it the default behavior (breaking change) later if it works fine for early adopters.

See also #2394

Always creating a subfolder is probably a signal for the hosting providers (Netlify, Github Pages...) to add a trailing slash automatically to URLs, which can be annoying.

@slorber fwiw this is fixable on Netlify by changing your post-processing settings. They have a UI bug, where the master checkbox to disable post-processing does not disable the pretty URL setting, and that is what causes e.g. https://v2.docusaurus.io/docs/2.0.0-alpha.63/installation to 301 redirect to https://v2.docusaurus.io/docs/2.0.0-alpha.63/installation/. You have to just turn them all off individually.

image

Thanks @lunelson , didn't know about this bug, do you have a link to track this?

Also, I tested these options on my own website, but still have a trailing slash issue ๐Ÿ˜… https://sebastienlorber.com/records-and-tuples-for-react

@slorber If you changed the options and tested your site right away, it might not have worked: the changes seem to take a few minutes to propagate. For me the URL above now works without redirecting to a trailing slash.

And no I don't know of an issue that tracks it at Netlify but I've seen it come up in discussions here and on Gatsby about trailing slash problems.

@slorber Having pointed that out however, I must say that in my company's websites (including 3 Docusaurus v2 sites and potentially soon more), we will probably turn this Netlify feature back on.

This is because we have had the strong recommendation from an SEO agency thatโ€”while neither format (trailing-slash or not) is "better"โ€”there should be 301 redirects to make sure that only one of the two formats is getting crawled and indexed. And because Netlify treats these URL formats identically, and its redirects system therefore doesn't allow you to 301 from one to the other (it would create an infinite loop), our only alternative is to use this pretty URLs feature (which does create a 301) and to accept the decision which is made for us, namely that trailing-slash will be the final format.

This creates a challenge for me right now because currently Docusaurus generates all URL patterns in router/anchor links, sitemap and canonical link tags without trailing slashes ๐Ÿ˜ฌ

mpsq commented

The sitemap can be generated with trailing slashes, that's what the option trailingSlash is for: https://v2.docusaurus.io/docs/using-plugins#docusaurusplugin-sitemap.

@mpsq well, that's one down at least ๐Ÿ˜…

mpsq commented

Yes :) On your list, "URL patterns in router/anchor links" and "canonical link tags" are actually the same thing, they depend on how the permalink prop is generated. All you need is to enable/allow that prop to have a trailing slash.

EDIT:
Maybe this could be a more general setting, rather than a granular one specific to the sitemap plugin.

Maybe this could be a more general setting, rather than a granular one specific to the sitemap plugin.

YES. This would be ideal.

@mpsq is there any chance you could give me an idea of where I need to intervene in my custom theme, to modify that permalink property in a way that will affect the router, and generated anchor and link tags all together?

mpsq commented

You could do it in your theme by tweaking DocItem / Layout and replace occurrences of permalink with permalink + "/". This is super hacky though, a configuration property in core Docusaurus would be much nicer.

a configuration property in core Docusaurus would be much nicer

Again: yes

Thanks, didn't notice but it seems my site does not have trailing slashes anymore :)

But anyway we should figure out a portable solution that works even on GithubPages where you don't have many hosting options to tweak... still believe that my initial proposal might be a better solution, need to find time to test this on multiple hosts.

For workarounds to override the permalink, you may be interested by this issue: #3501 (comment)

Not sure but something like that could work to customize the permalink:

import React from 'react';
import OriginalLayout from '@theme-original/Layout';
import Head from '@docusaurus/Head';
import {useLocation} from '@docusaurus/router';

export default function Layout(props) {
  const location = useLocation();
  return (
    <>
      <OriginalLayout {...props} />
      <Head>
        <meta
          property="canonical_url"
          content={location.pathname + "/"}
        />
      </Head>

    </>
  );
}

@slorber thanks, I'll take a look at that too. One thing about the router and this URL format issue that I haven't tested yet, is whether active link class matching will still work if the router paths are also forced to have trailing-slashes...

Notes:

  • v1 generates 2 files: /docs/myDoc.html + /docs/myDoc/index.html
  • v2 generates 1 file: /docs/myDoc/index.html
  • without /docs/myDoc.html file, Github Pages server redirect from /docs/myDoc to /docs/myDoc/

We should probably add 2 new options:

  • 1 option to generate the "missing" /docs/myDoc.html file, because some hosting solutions (emm GH pages) can't handle it properly.
  • 1 option to add trailing slashes to all site routes, so that it's safe to host your site on all hosts

We can't add a trailing / by default, not everybody is willing to have / at the end of the URLs (legacy/SEO reasons)

Another idea is to resolve all relative links at build time to absolute links, so that the presence of a trailing slash or not does not impact in any way the link target.

On Netlify, disabling the Pretty Url option prevent Netlify from adding the trailing slash, yet if user visits the page with a trailing slash, it is not removed client-side, and still potentially breaks relative links.

Until I figure how to integrate this properly in Docusaurus, here's a PR on the ReactNative website with some ideas to fix trailing slashes issues in userland: facebook/react-native-website#2297

I've been solving this issue mainly by conforming URLs to the desired format using a function within the Link componentโ€”since all internal links have to pass through here at some point; one complexity with this is that if you are using the baseUrl option you have to consider root-based URLs that are outside of the your sub-directory to be external, i.e. static anchors not router-links...

Having an option to generate /doc.html instead of /doc/index.html would be ideal for us

Another idea is to resolve all relative links at build time to absolute links, so that the presence of a trailing slash or not does not impact in any way the link target.

This is probably the easiest, safest, and with least consequences. It is probably also the fastest! Any downsides here? It might involve replacing domain with localhost:$PORT for npm run serve but that's fine imo.

FYI this problem is related to one of our dependency (not very well maintained, we already use a fork of it) that hardcodes this /folder/index.html pattern.

https://github.com/markdalgleish/static-site-generator-webpack-plugin/blob/master/index.js#L165

I'm not 100% (as it requires checking the behavior of all the most popular static hosting providers), but maybe we could just have a post-build step that would copy /folder/index.html to /folder.html, so that both files exist in the end.

That's basically what I did on ReactNative website and it worked great (on Github Pages) without significant build time increase: facebook/react-native-website#2297

Please try to add this site plugin and let me know if it works for your case.

I'll probably make it a separate npm package, as this tool may be useful for other SSGs too

In docusaurus/packages/docusaurus/src/client/normalizeLocation.ts
any trailing /index.html is knocked off the path.
If you knock off any trailing / as well with
pathname = pathname.replace(/\/$/, '');
on line 23, this seems to fix this bug.

But I don't know whether this would break anything else.
But it does seem to me that if you want to normalize, then allowing both /foo/bar/ and /foo/bar doesn't seem to be a very good job of normalizing to me.

Hi @slorber ! Are there any updates on this issue? We started to hit that on Litho website as well.

Just to provide more context, every time someone adds a relative link of the form [label](some-page) the resulting link is broken on the site (if you load the source page by URL - not by sidebar navigation) because the source page gets a trailing slash, and then the target is resolved to a level deeper than intended.

I'm aware that referencing file names (e.g x.mdx) works around this, but still a lot of "raw" uses slip past and go unnoticed, adding broken links.

As of time of writing, the first link on this page exhibits this behaviour: https://fblitho.com/docs/sections-tutorial#main_wrap

Just to make it clear: this issue is important for Docusaurus to solve and we'll work on it.
It is just not as simple as you might think it is.

Doing a fix that works for a single site is not hard.

But doing a fix that works for ALL Docusaurus sites, using a large variety of hosting providers, all with distinct behaviors regarding how they serve static files, is complicated.

And a bad fix deployed in production can lead to unwanted side-effects that negatively affect the SEO of established documentation websites and products, so we must be sure the changes we make are safe.

Note: the Gatsby ecosystem has the exact same trailing slash problem (we have some code in common).
See for example this recent blog post: https://jonsully.net/blog/trailing-slashes-and-gatsby/

For me, the source of all this is this function in an unmaintained dependency: https://github.com/markdalgleish/static-site-generator-webpack-plugin/blob/master/index.js#L165
Changing this logic is easy (fork the dep and patch it). Ensuring it does not produce unwanted side effects is time-consuming to check.


My plan is to provide a fix for trailing slash issues as an option:

  • existing sites keep the previous behavior (safe)
  • you can turn the option on if you encounter the trailing slash / relative links problem
  • newly initialized sites can have the option turned on

Unfortunately, this won't prevent existing sites to not have issues unless they turn the option on.

If this option is successful for everyone, we could turn it on by default (the risky part)


Note: until I provide this option, the workaround I suggest here should work, and is not too far from what I plan to implement. So please try this workaround and tell us if it works or not.

Is there an easy workaround (i.e. that can be done on the website repo without changing docusaurus itself or dependencies) to force a trailing slash on all links and URLs?

Please try to add this site plugin and let me know if it works for your case.

@slorber Can you share some instructions for trying this out? So far, I've tried adding the sitePlugin.js to a src/plugins/sitePlugin.js directory and updating the Docusaurus config to add plugins: ['./src/plugins/sitePlugin'], to the root of the document but it did not effect any change. The build remains the same and no folder.html style files are created.

PS: I'm using Netlify and I'm trying to avoid the duplicate URLs issue. Currently, the site crawlers are seeing two URLs, one without slashes (present in the pages that are build by D2) and with slashes (which Netlify prefers and tries redirecting to when it sees a URL without a slash).

This is because we have had the strong recommendation from an SEO agency thatโ€”while neither format (trailing-slash or not) is "better"โ€”there should be 301 redirects to make sure that only one of the two formats is getting crawled and indexed. And because Netlify treats these URL formats identically, and its redirects system therefore doesn't allow you to 301 from one to the other (it would create an infinite loop), our only alternative is to use this pretty URLs feature (which does create a 301) and to accept the decision which is made for us, namely that trailing-slash will be the final format.

I double-down on what @lunelson said. For users who are using Netlify, we need the final build to be using trailing-slashes as that decision is made for us, and turning on/off the "Pretty URLs" setting does not help.

Is there an easy workaround (i.e. that can be done on the website repo without changing docusaurus itself or dependencies) to force a trailing slash on all links and URLs?

@agentofuser not available now, but under consideration

@thehappybug are you even sure the plugin runs in the first place?

Have you tried adding logs so that you know which files are created? Note you can also run your own script with node after yarn build, it does not necessarily have to be a Docusaurus plugin if it complicated your life.

I double-down on what @lunelson said. For users who are using Netlify, we need the final build to be using trailing-slashes as that decision is made for us, and turning on/off the "Pretty URLs" setting does not help.

I claim another time that this is false: disabling pretty URLs works reliably on Netlify and there won't be any redirect. VERY IMPORTANT: the global "disable asset processing" checkbox is broken and does not really disable pretty URLs: you have to uncheck it.

Proof: as part of my study to solve this issue, I'm writing a trailing slash guide for the community. There's a live Netlify deployment with pretty URLs disable on which you can see for yourself there is no server-side redirect: https://github.com/slorber/trailing-slash-guide#netlify

@lunelson

This is because we have had the strong recommendation from an SEO agency thatโ€”while neither format (trailing-slash or not) is "better"โ€”there should be 301 redirects to make sure that only one of the two formats is getting crawled and indexed. And because Netlify treats these URL formats identically, and its redirects system therefore doesn't allow you to 301 from one to the other (it would create an infinite loop), our only alternative is to use this pretty URLs feature (which does create a 301) and to accept the decision which is made for us, namely that trailing-slash will be the final format.

I'm not an SEO expert but I believe having 301 redirects is not mandatory if you have canonical URLs and you explain to the crawlers that the 2 URLs are the same page. Docusaurus sites have canonical URLs by default and crawlers shouldn't penalize your site for publishing duplicated content

I've opened a draft PR to propose a solution: #4908

Thank you for your responses, @slorber. The trailing slash guide was quite useful.

Have you tried adding logs so that you know which files are created? Note you can also run your own script with node after yarn build, it does not necessarily have to be a Docusaurus plugin if it complicated your life.

I managed to get your solution working. I had previously made a mistake in registering plugins.

For now, I'm not going forwards with deploying this solution because:

  1. With Pretty URLs disabled, Netlify is serving pages at both slug.html, slug/ and slug. Even though the canonical URL should prevent any impact on SEO (I am not an expert but this is what I gleaned from the comments), I would prefer that Netlify's routing did not serve the pages at multiple URLs. Using Pretty URLs results in a cleaner routing that sits better with the web developers in my team (๐Ÿคทโ€โ™‚๏ธ).
  2. A previous Hugo-based version of our docs site used trailing slashes in the URLs. This worked well with Netlify's Pretty-URLs (we had assets optimization disabled but as you pointed out, the Pretty URLs feature still kicks in). When we migrated to D2, the sitemap suddenly had URLs without trailing slashes in them but the Pretty URLs feature still redirected users to the URLs with trailing slashes. This confused the Google crawler at least. You can see from the screenshot what I mean:
    image
    If you drill down into what happened, you see that each ignored page has this issue:
    image
    While I don't see our search performance going down as of now, my team is quite wary of continuing the site in this state.
  3. The solution you suggested copies the HTML pages outside the directories. With Pretty URLs, these pages would turn from slug.html to just slug. Since I'm stuck with having trailing slashes in a previous version of the site and Netlify having permanently redirected crawlers to the URLs with trailing slashes, the solution doesn't help me. I wait for an option in D2 that helps me control trailing slashes so that I can put trailing slashes consistently in all places, in all pages, and sitemap.

I hope this feedback helps. Let me know if I can help you test any future releases related to this issue.

Thanks for the feedback, I understand why you would want to keep the exact same URLs you had before, and I'll make that possible.

As soon as my PR is merged, you'd be able to use the @canary npm dist tag to give it a try (soon).

As part of #4908

I'm implementing a new {trailingSlash: boolean | undefined} config.

  • undefined: it keeps the existing retro-compatible behavior (paths not modified), and output /path/index.html
  • false: remove a potential trailing slash on routes, links, canonical URLs, and output /path.html
  • true: force a trailing slash on routes, links, canonical URLs, and output /path/index.html

Refer to https://github.com/slorber/trailing-slash-guide for how the files will be served by your host.

I've created 3 deployments to test this feature:

Note: using true/false instead of undefined allows to use Netlify with pretty URLs on without having any annoying redirection, but it's not the case for the undefined behavior that will keep redirecting with pretty URLs on (default behavior): I disabled pretty URLs for undefined on purpose.

PLEASE: help me review those deployments and let me know if you see any unwanted side-effects: now is a better time to complain than after merging the PR :)

The PR has been merged, and you can try this right now with the @canary dist tag or wait for beta.1

https://www.npmjs.com/package/@docusaurus/core?activeTab=versions

2.0.0-beta.df8a900f9

Please let me know if anything does not work asap.

Also, in general, can you please let me know:

  • which host you had a problem with
  • which trailingSlash setting fixed the problem

According to your feedbacks, we may update the doc and recommend a trailingSlash setting for each host