elixir-lang/ex_doc

EPUB: Links to anchors like #&&/2 cause a fatal error while parsing the file

milmazz opened this issue · 0 comments

After checking the Elixir.epub file with epubcheck I got the following summary:

$ epubcheck doc/elixir/Elixir.epub --json elixir_docs.json

Check finished with errors
Messages: 9 fatals / 425 errors / 0 warnings / 0 infos

So, I will start listing here the issue with the highest severity, the one that's causing a fatal error while parsing the XHTML document.

Filtering a little bit the result with jq

$ jq '.messages[] | select(.severity=="FATAL") | {id: .ID, message: .message, locations: .locations | map({path, line, column})}'  elixir_docs.json

We got the following:

{
  "id": "RSC-016",
  "message": "Fatal Error while parsing file: The entity name must immediately follow the '&' in the entity reference.",
  "locations": [
    {
      "path": "OEBPS/Bitwise.xhtml",
      "line": 25,
      "column": 26
    },
    {
      "path": "OEBPS/Function.xhtml",
      "line": 38,
      "column": 46
    },
    {
      "path": "OEBPS/Kernel.SpecialForms.xhtml",
      "line": 67,
      "column": 20
    },
    {
      "path": "OEBPS/Kernel.xhtml",
      "line": 116,
      "column": 38
    },
    {
      "path": "OEBPS/anonymous-functions.xhtml",
      "line": 94,
      "column": 409
    },
    {
      "path": "OEBPS/basic-types.xhtml",
      "line": 84,
      "column": 335
    },
    {
      "path": "OEBPS/code-anti-patterns.xhtml",
      "line": 257,
      "column": 275
    },
    {
      "path": "OEBPS/operators.xhtml",
      "line": 31,
      "column": 781
    },
    {
      "path": "OEBPS/patterns-and-guards.xhtml",
      "line": 158,
      "column": 534
    }
  ]
}

When I started inspecting each of these files I noticed a pattern that matches with the error description of the entity name must immediately follow the '&' in the entity reference.

  • anonymous-functions.xhtml -> <a href="Kernel.SpecialForms.xhtml#&/1">its documentation</a>
  • basic-types.xhtml -> <a href="Kernel.xhtml#&&/2"><code class="inline">&amp;&amp;/2</code></a>
  • Bitwise.xhtml -> <a href="#&&&/2"><code class="inline">&amp;&amp;&amp;/2</code></a>

So, the problem here in particular are the links to anchors like &/1, &&/2 and so on.

Why is this important?

In readers like Apple Books, you get the following warning at the beginning of the document:

Screenshot 2024-01-25 at 11 43 58 a m

And more importantly, once you reach the end of that document you will notice is truncated, at least if you compare that result with the HTML version:

Screenshot 2024-01-25 at 12 02 40 p m

Solution / Discussion

I'm putting this out there to start a discussion to see the approach we want to take for the EPUB formatter, I think we can first try changing those anchors from #&/1 to #&amp;/1 and see if that works, otherwise, given that for the EPUB format the anchor name and links to it are all internal, we can change the anchor generation to be a hash instead.