alexadam/save-as-ebook

Web page not generating good epub

Closed this issue · 3 comments

The following web page would generate "gibberish" epub:
http://ned.ipac.caltech.edu/level5/March01/Carroll3/Carroll3.html

It is worth noting that the other pages don't have this problem.
FYI: the index/content page is here:
http://ned.ipac.caltech.edu/level5/March01/Carroll3/Carroll_contents.html

Sorry for the late reply on this. The problem is that page has a html error: <palign=justify> that breaks the parser... I can do a quick fix for it but, then, the "palign=justify" it's not a valid tag and it will be ignored (so its content won't be displayed, and this is not good practice). I have to think of a way to manage html errors. Sorry for the inconvenience

Thanks for the explanation. Presumably, the tag must have been <p align=justify>
When I saw that there are PS files for those same web pages, I was thinking that the HTML were generated from LaTeX. But because of this error (<palign=..> instead of <p align=..>), the HTML might actually have been the source.

And since the error comes from HTML, I could try to contact the author of the web page to tell him to correct it manually. But he is Sean Carroll, a famous physicist. I'm not sure he would have time to read my bug report :)

On the other hand, when an invalid tag is present, esp when the tag name (palign) does not exist in HTML standard, does it necessarily lead to the disappearance of everything after that? Can't we just ignore the tag? Esp since there is no </palign> after it, that means it can be considered as an empty element, ie no text.

I'm unable to find a way to contact Sean Carroll. But I found a way to contact the webmaster of that web site.