pgaskin/kepubify

Make converting a kepub-formatted epub a no-op

dstaley opened this issue · 3 comments

Comparing the output of passing a kepub from the Kobo store through kepubify, there's a pretty large diff of extremely minor changes that, if I'm understanding correctly, don't actually affect how the book renders.

It'd be nice to make passing a kepub-formatted epub a no-op (other than adding the kepub.epub extension), either by default or through a CLI flag.

If you're referring to the whitespace and tag node name casing/attribute changes, those are a inherent part of how kepubify works. The HTML5 decoder follows the spec, so it ends up being subject to some normalization (the same way as any compliant renderer, which indeed doesn't make any difference in how it renders, nor even how it's represented internally when re-parsed). The encoder follows the spec except that it makes a few minor changes (see the html branch for these) to ensure the output is polyglot XHTML/HTML.

I suppose I could allow .kepub.epub to be passed to the --copy/-x option if that would work for you.

Forgive my ignorance as I'm a newbie to the Kobo community, but is there not a way to determine if a given epub is already a kepub outside of the filename? If not, I think we can close this as the changes made by kepubify are generally a good idea, and I wouldn't mind if it ran on my Kobo-sourced epubs as well :)

There is a way to determine it via the koboSpans, and I could abort a conversion from there if it's already a kepub. That would be a breaking change, though.