EPUB: Links to anchors like #&&/2 cause a fatal error while parsing the file
milmazz opened this issue · 0 comments
After checking the Elixir.epub
file with epubcheck
I got the following summary:
$ epubcheck doc/elixir/Elixir.epub --json elixir_docs.json
Check finished with errors
Messages: 9 fatals / 425 errors / 0 warnings / 0 infos
So, I will start listing here the issue with the highest severity, the one that's causing a fatal error while parsing the XHTML document.
Filtering a little bit the result with jq
$ jq '.messages[] | select(.severity=="FATAL") | {id: .ID, message: .message, locations: .locations | map({path, line, column})}' elixir_docs.json
We got the following:
{
"id": "RSC-016",
"message": "Fatal Error while parsing file: The entity name must immediately follow the '&' in the entity reference.",
"locations": [
{
"path": "OEBPS/Bitwise.xhtml",
"line": 25,
"column": 26
},
{
"path": "OEBPS/Function.xhtml",
"line": 38,
"column": 46
},
{
"path": "OEBPS/Kernel.SpecialForms.xhtml",
"line": 67,
"column": 20
},
{
"path": "OEBPS/Kernel.xhtml",
"line": 116,
"column": 38
},
{
"path": "OEBPS/anonymous-functions.xhtml",
"line": 94,
"column": 409
},
{
"path": "OEBPS/basic-types.xhtml",
"line": 84,
"column": 335
},
{
"path": "OEBPS/code-anti-patterns.xhtml",
"line": 257,
"column": 275
},
{
"path": "OEBPS/operators.xhtml",
"line": 31,
"column": 781
},
{
"path": "OEBPS/patterns-and-guards.xhtml",
"line": 158,
"column": 534
}
]
}
When I started inspecting each of these files I noticed a pattern that matches with the error description of the entity name must immediately follow the '&' in the entity reference.
anonymous-functions.xhtml
-><a href="Kernel.SpecialForms.xhtml#&/1">its documentation</a>
basic-types.xhtml
-><a href="Kernel.xhtml#&&/2"><code class="inline">&&/2</code></a>
Bitwise.xhtml
-><a href="#&&&/2"><code class="inline">&&&/2</code></a>
So, the problem here in particular are the links to anchors like &/1
, &&/2
and so on.
Why is this important?
In readers like Apple Books, you get the following warning at the beginning of the document:
And more importantly, once you reach the end of that document you will notice is truncated, at least if you compare that result with the HTML version:
Solution / Discussion
I'm putting this out there to start a discussion to see the approach we want to take for the EPUB formatter, I think we can first try changing those anchors from #&/1
to #&/1
and see if that works, otherwise, given that for the EPUB format the anchor name and links to it are all internal, we can change the anchor generation to be a hash instead.