fire1ce/mkdocs-embed-external-markdown

Relative links to files in the same source GitHub repository lead to raw versions

sdruskat opened this issue · 2 comments

Currently, link URLs are joined with the base URL:

link_url = urljoin(base_url, match.group("link_url"))

This is fine for most cases, where the links are external, or provided as absolute links.

When using URLs from raw.githubusercontent.com, however, this breaks the intended use of the original Markdown in cases where the links are relative and targets live in the same GitHub repository.

Examples:

  • A link [schema](schema.json) in https://github.com/user/repo/file.md would lead to https://raw.githubusercontent.com/user/repo/schema.json. This is somewhat fine, as the target file is JSON.
  • A link [another page](other-page.md) in https://github.com/user/repo/file.md would lead to https://raw.githubusercontent.com/user/repo/other-page.md. This is not what I would expect when reading a rendered page on the internet which links to "another page".
  • A link [another page](other-page.md#section) in https://github.com/user/repo/file.md would lead to https://raw.githubusercontent.com/user/repo/other-page.md#section, with the browser showing me the file from the top position (i.e., the same as for https://raw.githubusercontent.com/user/repo/other-page.md, without the anchor). This is not what I would expect when reading a rendered page on the internet which links to "another page". Additionally, jumping to the anchor doesn't work.

Fixes?

One potential fix for this would be to handle raw.githubusercontent.com URLs specifically to map them back to their original repos, e.g., https://raw.githubusercontent.com/user/repo/main/README.md would become https://github.com/user/repo/blob/main/README.md. This would also preserve handles.

Explicit links to raw... URLs would be preserved.

This could also be done for a subset of links, e.g., only where they lead to Markdown files.

Limitations

Every platform would have to be handled differently if the plugin is meant to be used beyond GitHub.
E.g., GitLab raw links look similar to https://gitlab.com/user/repo/-/raw/main/....

Can I help?

I'd be happy to look into a simple fix just for GitHub URLs, as I need this downstream anyway.

Thanks for the #14 PR and the contribution. Closing for now. Untill the next exception =)

Thanks for the #14 PR and the contribution. Closing for now. Untill the next exception =)

Thanks for taking the time to review and merge this!