TOC broken for MOBI on Kindle devices/app - potential fix: switching from Kindlegen to Calibre's ebook-convert
Closed this issue ยท 6 comments
Hello there! Long-time user of epub-press here, and I absolutely love it. There's just one minor issue that's been slightly inconveniencing me for a while; a small improvement I feel epub-press could make. Finally decided to make an issue for it.
Current Behavior
The MOBI files currently created/emailed by epub-press don't have a working TOC in Kindle for PC or on my Paperwhite. While the links in the first page under the heading "Table of Contents" (on the right on the image below), that is generated by epub-press, work, the inbuilt TOC (in the pane on the left) doesn't, and currently looks like this:
The built-in Kindle TOC (the pane on the left above) does not have listings for the different articles.
While the EPUB file produced by epub-press does have a working TOC, when I pass it through Kindlegen locally, which from what I can see is what epub-press uses, the TOC breaks, probably due to what is mentioned over in multiple threads at Mobileread. In particular:
Expected Behavior
The MOBI files should have a working TOC that I can use to navigate through the book on Kindle apps and devices. This is what the TOC would ideally look like:
The built-in Kindle TOC (the pane on the left above) does have working listings for the different articles.
Steps to reproduce
Create a MOBI file from any number of web pages/articles and then download (or email it) to a Kindle app/device. Open the TOC, and notice that the whole book is one chapter according to the Kindle app/device.
System Information
Client-side: Chrome 95.0.4638.69 on Windows 10 1909.
Server-side: I'm using the epub-press plugin from the Chrome store, which uses the server at epub.press.
Potential Fix
From what I can tell, Kindlegen has now been deprecated, and even when it was actively maintained this particular issue (among a few others) was not fixed. There's a replacement called Kindle Previewer, but it doesn't appear to have a way to run it natively on Linux. There's also a bit of bloat in the file generated with Kindlegen, since it contains the original EPUB file in addition to the converted file, which about doubles the file size and bandwidth required for downloading/emailing.
Calibre has a command-line tool, ebook-convert, that I tested and that works perfectly with the EPUB input from epub-press, and correctly builds a TOC that is recognised by the Kindle app and devices.
To get something similar to what Kindlegen currently does, ebook-convert "path/to/file.epub" "path/to/file.mobi" --mobi-file-type both --duplicate-links-in-toc
can be used. This command requires that an output file be specified, along with the additional arguments that create a joint MOBI6/KF8 file (which is required for the Kindle Personal Document Service to process the file but also leave advanced typesetting enabled). The argument --duplicate-links-in-toc is because some websites have the same title for all of their pages; this argument will create TOC entries for each of those titles, as long as they lead to different locations.
I don't know enough about the server setup/JavaScript to write my own pull request, or understand how significant or complex this switch would be. I'd understand it if this was too big of a change to the current setup simply for the sake of a working TOC on MOBI outputs. I also don't know if there's a particular reason for using Kindlegen here. I am creating this issue simply because this is something that has been slightly inconvenient to me as someone who uses epub-press to generate a mobi from dozens of URLs and has then found navigation to be a bit of a pain.
I feel that if this is fixed I can enjoy the convenience that is promised by epub-press, and that I really value. Right now, in order to get a working TOC, I'd have to grab the epub output from epub-press, run it through Calibre locally, and then email it myself to my Kindle.
EDIT: I see now that the Kindlegen in /bin/ is being used in this project. Calibre doesn't appear to have a single bin file, but rather an installation procedure/script, as detailed here. After running this script, ebook-convert (located at /opt/calibre) is added to path and can be called. I'm not sure if there are particular files that can be extracted from the default install that are sufficient for conversion, because the whole install comes to around about 350MB.
Some Calibre tools are available at https://www.npmjs.com/package/node-calibre, but they require Calibre be installed as well, and I'm unsure as to if/how they can be implemented.
Wowza! Thanks for this amazingly thorough and detailed writeup @sanujar ๐.
I haven't been doing a ton of active development on EpubPress, but I suspect this might be low effort to fix?
The current conversion step is already just an exec
out to kindlegen with the path... perhaps this fix is just a matter of adding a binary for epub-convert
and replacing the command with the one you suggested?
epub-press/lib/book-services.js
Lines 278 to 289 in 52188ee
Not sure if/when I'll be able to get around to this - but I'll try to give that a try ๐.
Thanks for taking a look @haroldtreen!
Decided to take a look at this myself. This is my first time working with Docker and with node.js, so while I have managed to get a few things working, quite a lot is still broken, and I'd appreciate any insight you could give.
Unfortunately, ebook-convert doesn't seem to have a single executable, but rather requires Calibre be installed, as far as I can tell. In order to do this, I added the following line to the Dockerfile, which runs the whole Calibre set-up script on initialisation of the container if the host architecture is Linux (ideally with a plan to be able to fallback to Kindlegen on Darwin or where Calibre hasn't been installed).
https://github.com/sanujar/epub-press/blob/b69368171f6246a91c3e7d9e286874853153a518/Dockerfile#L8
While this does successfully install Calibre and add it to PATH, I've had less luck with getting ebook-convert working through book-services.js. Here's my first attempt at this change:
This exact command works perfectly locally, and, on the server, if I somehow mess the command up I get an error printed out into the terminal. However, when sent correctly on the server, it appears to fail silently and let the rest of the script continue, resulting in a 404 when there there is an attempt to finally download the mobi file:
Logs
server_1 | POST /api/v1/books 202 632.393 ms - 18
server_1 | GET /api/v1/books/TTgh6pXD6/status 200 5.088 ms - 49
server_1 | GET /api/v1/books/TTgh6pXD6/status 200 1.244 ms - 46
server_1 | Reached here //this is the start of the mobi conversion block
server_1 | GET /api/v1/books/TTgh6pXD6/status 200 2.177 ms - 44
server_1 | Executing (default): INSERT INTO `Books` (`title`,`sections`,`uid`,`createdAt`,`updatedAt`) VALUES ($1,$2,$3,$4,$5);
server_1 | verbose: Book Published id=TTgh6pXD6
server_1 | GET /api/v1/books/TTgh6pXD6/status 200 0.995 ms - 34
server_1 | Executing (default): SELECT `id`, `title`, `sections`, `uid`, `createdAt`, `updatedAt` FROM `Books` AS `Book` WHERE `Book`.`uid` = 'TTgh6pXD6' LIMIT 1;
server_1 | GET /api/v1/books/TTgh6pXD6/download?filetype=mobi 404 12.679 ms - 72
The rest of the server still works fine, including EPUBs, and kindlegen works if I replace this line with the original. I tried debugging, and I can see that no ,mobi file is ever created in the folder with the epubs, but I can't figure out what's happening.
It feels to me sometimes that the amount of time between completing the epub creation and getting "Book Published" is way too short - maybe the script isn't waiting for the output from Calibre for some reason?
Ah - all the investigation! Thanks for getting started on it ๐.
Some thoughts...
- I think the
exec
command you are doing would be equivalent to this.Not an expect on how the command line proceses args - but maybe something that would confuseebook-convert "epub/path" "mobi/path" "--mobi-file-type=both" "--duplicate-links-in-toc" vs. epub-convert epub/path mobi/path --mobi-file-type=both --duplicate-links-in-toc
epub-convert
? - The logs are definitely a pain... I noticed a while ago that things don't log as I might expect... ideally you could do
console.log
and see what's going on. There's also aLogger
module that should be getting called with any exception hit when running that command, but it's disabled whenNODE_ENV
is test...
Line 19 in 52188ee
Under the hood that useswinston
and maybe a backwards incompatible change was installed at some point? Could try pinningwinston
to2.3.0
in thepackage.json
and see if that fixes anything? Should definitely be seeing stuff logged if there's a failure. - The fact that the book isn't created definitely makes it seem like the command isn't running. But it's weird because if that were to happen the whole pipeline should be considered a failure and I'd expect the body of the status to be an error and the JS module to not download ๐ค. One weird thing I'm noticing is that
error.code === 1
would technically also be fine... maybe this code should be>= 1
?
epub-press/lib/book-services.js
Line 281 in 52188ee
It could be thatepub-convert
returns an exit code of1
when the convert fails and that just happens to be the one exit code we don't fail on ๐ .
In summary:
- See if removing the
""
from theexec
fixes anything. - See if rolling winston back to
2.3.0
in package.json fixes any logging (might also need to runnpm install
to update the lockfile...). - See if you can add
console.log(error)
orconsole.error(error)
in that convert step to see anything? - See if updating the error check to be
error && error.code !== 0
fixes anything.
Hope that helps!
Thank you for the help! Saved me quite a few hours of debugging: I'd completely missed the fact that it was error.code > 1
and not >=
. Yep, ebook-convert
was exiting with an error code of 1 and that was just ignored. Thanks to that, I was able to find this post, and it turns out that Calibre depends on libGL, which appears to be a graphics library, even if you're only using the command line tools. I managed to get a bash shell running in epub-press's Docker container and apt-get'd libgl1-mesa-glx
. The generation/conversion process now goes flawlessly, but I'm unsure as to whether/how this package can be included by default using Docker, without needing to depend on apt-get
and manual intervention.
Maybe, ideally, if Calibre throws an error or is not present (such as on Darwin), book-services.js could automatically fallback to Kindlegen? I don't know enough about JS/node.js, so is there a way of chaining the commands together in such a way?
I really appreciate all the help you've given so far; it has definitely made it possible for me to work on this despite knowing close to nothing about Docker/node.
Dang! Can't believe that was the issue - what an obscure bug to hit ๐ . Glad my wiser eyes were able to catch that!
At this point it sounds like you're pretty close to there - solid job for knowing no node or Docker ๐ !
Maybe a few changes from here:
- The docker image should always be linux based - so you can drop the
if
check and just add the command for installing calibre on every run. - Alternatively - you might be able to commit the installer and just run the
sh /dev/stdin
with the pre-downloaded file? Might save doing a network request when building the image? - Extra alternative! You could probably pass an
env
variable to the docker image - something likeCONVERSION_BACKEND
and set that tocalibre
/kindlegen
. Then instead of youif
check in the image looking at whether we're on linux - it would check if we wanted calibre as our conversion backend and install it if so. Once the docker image is built - we could reuse the environment variable in the book pipeline to decide which command to exec - no fallback. We can choose one or the other and expect it to work every time...
...Or maybe just always install calibre in the docker image but then use an environment variable to toggle... thinking something like this...
Dockerfile
//...
RUN wget -nv -O- https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin;
RUN apt-get update && apt-get install -y libgl1-mesa-glx;
//...
.env
CONVERSION_BACKEND=calibre
book-services.js
static convertToMobi(book) {
return new Promise((resolve, reject) => {
const kindlegenCommand = `${Config.KINDLEGEN} "${book.getEpubPath()}"`;
const calibreCommand = `epub-convert ....`;
const conversionCommand = process.env.CONVERSION_BACKEND === 'calibre' ? calibreCommand : kindlegenCommand;
exec(conversionCommand, (error) => {
if (error && error.code > 1) {
log.exception('BookServices.convertToMobi')(error);
reject(error);
} else {
resolve(book);
}
});
});
}
Could also do a fallback... but this seems like it might be simpler / encourage a more consistent experience.
Thanks @haroldtreen, especially for all that code for book-services.js
! Everything seems to work now. Created a PR at #78 with all of the changes I made.
I went with your last alternative, but slightly modified it: I set up a build ARG so that I could use that to decide whether or not to install Calibre, which then creates an ENV_VAR for book-services.js
. I decided against committing the installer (not the install script) since the whole Calibre tgz file is over 100MB, which would bloat the repo a bit too much. For stability, there is the possibility of pegging the version of Calibre; please let me know if you would like me to do that.
The code should be relatively stable; I tested multiple permutations, such as Calibre vs kindlegen as switched by the ARG in the Dockerfile, and sending mail with and without STARTTLS (which is an added bonus I included here because I couldn't connect to my mail server without that). The apt-get update and install seems to throw a Broken Pipe error a few dozen times every time it runs, but it doesn't seem to actually affect anything: the installer continues, and everything seems to work.
Once again, thank you for epub-press! This was a bit of an adventure, but I definitely enjoyed it. I can't wait to see this running on the server; apart from the chapter titles now working, the content that can fit in MOBI files to be emailed is effectively doubled, so I can save on creating Part 1 of n books.