haroldtreen/epub-press

TOC broken for MOBI on Kindle devices/app - potential fix: switching from Kindlegen to Calibre's ebook-convert

Closed this issue ยท 6 comments

Hello there! Long-time user of epub-press here, and I absolutely love it. There's just one minor issue that's been slightly inconveniencing me for a while; a small improvement I feel epub-press could make. Finally decided to make an issue for it.

Current Behavior

The MOBI files currently created/emailed by epub-press don't have a working TOC in Kindle for PC or on my Paperwhite. While the links in the first page under the heading "Table of Contents" (on the right on the image below), that is generated by epub-press, work, the inbuilt TOC (in the pane on the left) doesn't, and currently looks like this:

TOC in Kindle for PC

The built-in Kindle TOC (the pane on the left above) does not have listings for the different articles.

While the EPUB file produced by epub-press does have a working TOC, when I pass it through Kindlegen locally, which from what I can see is what epub-press uses, the TOC breaks, probably due to what is mentioned over in multiple threads at Mobileread. In particular:

As st_albert mentioned, you must create your own inline html TOC (marked as a toc in the guide section of the opf) for kindlegen to enable the "go to contents" feature. Kindlegen doesn't make it for you from the ncx.

Expected Behavior

The MOBI files should have a working TOC that I can use to navigate through the book on Kindle apps and devices. This is what the TOC would ideally look like:

Working TOC in Kindle for PC

The built-in Kindle TOC (the pane on the left above) does have working listings for the different articles.

Steps to reproduce

Create a MOBI file from any number of web pages/articles and then download (or email it) to a Kindle app/device. Open the TOC, and notice that the whole book is one chapter according to the Kindle app/device.

System Information

Client-side: Chrome 95.0.4638.69 on Windows 10 1909.
Server-side: I'm using the epub-press plugin from the Chrome store, which uses the server at epub.press.

Potential Fix

From what I can tell, Kindlegen has now been deprecated, and even when it was actively maintained this particular issue (among a few others) was not fixed. There's a replacement called Kindle Previewer, but it doesn't appear to have a way to run it natively on Linux. There's also a bit of bloat in the file generated with Kindlegen, since it contains the original EPUB file in addition to the converted file, which about doubles the file size and bandwidth required for downloading/emailing.

Calibre has a command-line tool, ebook-convert, that I tested and that works perfectly with the EPUB input from epub-press, and correctly builds a TOC that is recognised by the Kindle app and devices.

To get something similar to what Kindlegen currently does, ebook-convert "path/to/file.epub" "path/to/file.mobi" --mobi-file-type both --duplicate-links-in-toc can be used. This command requires that an output file be specified, along with the additional arguments that create a joint MOBI6/KF8 file (which is required for the Kindle Personal Document Service to process the file but also leave advanced typesetting enabled). The argument --duplicate-links-in-toc is because some websites have the same title for all of their pages; this argument will create TOC entries for each of those titles, as long as they lead to different locations.

I don't know enough about the server setup/JavaScript to write my own pull request, or understand how significant or complex this switch would be. I'd understand it if this was too big of a change to the current setup simply for the sake of a working TOC on MOBI outputs. I also don't know if there's a particular reason for using Kindlegen here. I am creating this issue simply because this is something that has been slightly inconvenient to me as someone who uses epub-press to generate a mobi from dozens of URLs and has then found navigation to be a bit of a pain.

I feel that if this is fixed I can enjoy the convenience that is promised by epub-press, and that I really value. Right now, in order to get a working TOC, I'd have to grab the epub output from epub-press, run it through Calibre locally, and then email it myself to my Kindle.

EDIT: I see now that the Kindlegen in /bin/ is being used in this project. Calibre doesn't appear to have a single bin file, but rather an installation procedure/script, as detailed here. After running this script, ebook-convert (located at /opt/calibre) is added to path and can be called. I'm not sure if there are particular files that can be extracted from the default install that are sufficient for conversion, because the whole install comes to around about 350MB.

Some Calibre tools are available at https://www.npmjs.com/package/node-calibre, but they require Calibre be installed as well, and I'm unsure as to if/how they can be implemented.

Wowza! Thanks for this amazingly thorough and detailed writeup @sanujar ๐ŸŽ‰.

I haven't been doing a ton of active development on EpubPress, but I suspect this might be low effort to fix?

The current conversion step is already just an exec out to kindlegen with the path... perhaps this fix is just a matter of adding a binary for epub-convert and replacing the command with the one you suggested?

static convertToMobi(book) {
return new Promise((resolve, reject) => {
exec(`${Config.KINDLEGEN} "${book.getEpubPath()}"`, (error) => {
if (error && error.code > 1) {
log.exception('BookServices.convertToMobi')(error);
reject(error);
} else {
resolve(book);
}
});
});
}

Not sure if/when I'll be able to get around to this - but I'll try to give that a try ๐Ÿ‘.

Thanks for taking a look @haroldtreen!

Decided to take a look at this myself. This is my first time working with Docker and with node.js, so while I have managed to get a few things working, quite a lot is still broken, and I'd appreciate any insight you could give.

Unfortunately, ebook-convert doesn't seem to have a single executable, but rather requires Calibre be installed, as far as I can tell. In order to do this, I added the following line to the Dockerfile, which runs the whole Calibre set-up script on initialisation of the container if the host architecture is Linux (ideally with a plan to be able to fallback to Kindlegen on Darwin or where Calibre hasn't been installed).

https://github.com/sanujar/epub-press/blob/b69368171f6246a91c3e7d9e286874853153a518/Dockerfile#L8

While this does successfully install Calibre and add it to PATH, I've had less luck with getting ebook-convert working through book-services.js. Here's my first attempt at this change:

https://github.com/sanujar/epub-press/blob/b69368171f6246a91c3e7d9e286874853153a518/lib/book-services.js#L280

This exact command works perfectly locally, and, on the server, if I somehow mess the command up I get an error printed out into the terminal. However, when sent correctly on the server, it appears to fail silently and let the rest of the script continue, resulting in a 404 when there there is an attempt to finally download the mobi file:

Logs

server_1         | POST /api/v1/books 202 632.393 ms - 18
server_1         | GET /api/v1/books/TTgh6pXD6/status 200 5.088 ms - 49
server_1         | GET /api/v1/books/TTgh6pXD6/status 200 1.244 ms - 46
server_1         | Reached here //this is the start of the mobi conversion block
server_1         | GET /api/v1/books/TTgh6pXD6/status 200 2.177 ms - 44
server_1         | Executing (default): INSERT INTO `Books` (`title`,`sections`,`uid`,`createdAt`,`updatedAt`) VALUES ($1,$2,$3,$4,$5);
server_1         | verbose: Book Published id=TTgh6pXD6
server_1         | GET /api/v1/books/TTgh6pXD6/status 200 0.995 ms - 34
server_1         | Executing (default): SELECT `id`, `title`, `sections`, `uid`, `createdAt`, `updatedAt` FROM `Books` AS `Book` WHERE `Book`.`uid` = 'TTgh6pXD6' LIMIT 1;
server_1         | GET /api/v1/books/TTgh6pXD6/download?filetype=mobi 404 12.679 ms - 72

The rest of the server still works fine, including EPUBs, and kindlegen works if I replace this line with the original. I tried debugging, and I can see that no ,mobi file is ever created in the folder with the epubs, but I can't figure out what's happening.

It feels to me sometimes that the amount of time between completing the epub creation and getting "Book Published" is way too short - maybe the script isn't waiting for the output from Calibre for some reason?

Ah - all the investigation! Thanks for getting started on it ๐ŸŽ‰.

Some thoughts...

  1. I think the exec command you are doing would be equivalent to this.
    ebook-convert "epub/path" "mobi/path" "--mobi-file-type=both" "--duplicate-links-in-toc"
    vs. 
    epub-convert epub/path mobi/path --mobi-file-type=both --duplicate-links-in-toc
    
    Not an expect on how the command line proceses args - but maybe something that would confuse epub-convert?
  2. The logs are definitely a pain... I noticed a while ago that things don't log as I might expect... ideally you could do console.log and see what's going on. There's also a Logger module that should be getting called with any exception hit when running that command, but it's disabled when NODE_ENV is test...
    if (process.env.NODE_ENV === 'test' && !this._options.overrideMock) {

    Under the hood that uses winston and maybe a backwards incompatible change was installed at some point? Could try pinning winston to 2.3.0 in the package.json and see if that fixes anything? Should definitely be seeing stuff logged if there's a failure.
  3. The fact that the book isn't created definitely makes it seem like the command isn't running. But it's weird because if that were to happen the whole pipeline should be considered a failure and I'd expect the body of the status to be an error and the JS module to not download ๐Ÿค”. One weird thing I'm noticing is that error.code === 1 would technically also be fine... maybe this code should be >= 1?
    if (error && error.code > 1) {

    It could be that epub-convert returns an exit code of 1 when the convert fails and that just happens to be the one exit code we don't fail on ๐Ÿ˜….

In summary:

  • See if removing the "" from the exec fixes anything.
  • See if rolling winston back to 2.3.0 in package.json fixes any logging (might also need to run npm install to update the lockfile...).
  • See if you can add console.log(error) or console.error(error) in that convert step to see anything?
  • See if updating the error check to be error && error.code !== 0 fixes anything.

Hope that helps!

Thank you for the help! Saved me quite a few hours of debugging: I'd completely missed the fact that it was error.code > 1 and not >=. Yep, ebook-convert was exiting with an error code of 1 and that was just ignored. Thanks to that, I was able to find this post, and it turns out that Calibre depends on libGL, which appears to be a graphics library, even if you're only using the command line tools. I managed to get a bash shell running in epub-press's Docker container and apt-get'd libgl1-mesa-glx. The generation/conversion process now goes flawlessly, but I'm unsure as to whether/how this package can be included by default using Docker, without needing to depend on apt-get and manual intervention.

Maybe, ideally, if Calibre throws an error or is not present (such as on Darwin), book-services.js could automatically fallback to Kindlegen? I don't know enough about JS/node.js, so is there a way of chaining the commands together in such a way?

I really appreciate all the help you've given so far; it has definitely made it possible for me to work on this despite knowing close to nothing about Docker/node.

Dang! Can't believe that was the issue - what an obscure bug to hit ๐Ÿ˜…. Glad my wiser eyes were able to catch that!

At this point it sounds like you're pretty close to there - solid job for knowing no node or Docker ๐Ÿ˜„ !

Maybe a few changes from here:

  • The docker image should always be linux based - so you can drop the if check and just add the command for installing calibre on every run.
  • Alternatively - you might be able to commit the installer and just run the sh /dev/stdin with the pre-downloaded file? Might save doing a network request when building the image?
  • Extra alternative! You could probably pass an env variable to the docker image - something like CONVERSION_BACKEND and set that to calibre/kindlegen. Then instead of you if check in the image looking at whether we're on linux - it would check if we wanted calibre as our conversion backend and install it if so. Once the docker image is built - we could reuse the environment variable in the book pipeline to decide which command to exec - no fallback. We can choose one or the other and expect it to work every time...

...Or maybe just always install calibre in the docker image but then use an environment variable to toggle... thinking something like this...

Dockerfile

//...
RUN wget -nv -O- https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin;
RUN apt-get update && apt-get install -y libgl1-mesa-glx;
//...

.env

CONVERSION_BACKEND=calibre

book-services.js

static convertToMobi(book) { 
     return new Promise((resolve, reject) => {
         const kindlegenCommand = `${Config.KINDLEGEN} "${book.getEpubPath()}"`;
         const calibreCommand = `epub-convert ....`;
         const conversionCommand = process.env.CONVERSION_BACKEND === 'calibre' ? calibreCommand : kindlegenCommand;
         exec(conversionCommand, (error) => { 
             if (error && error.code > 1) { 
                 log.exception('BookServices.convertToMobi')(error); 
                 reject(error); 
             } else { 
                 resolve(book); 
             } 
         }); 
     }); 
 }

Could also do a fallback... but this seems like it might be simpler / encourage a more consistent experience.

Thanks @haroldtreen, especially for all that code for book-services.js! Everything seems to work now. Created a PR at #78 with all of the changes I made.

I went with your last alternative, but slightly modified it: I set up a build ARG so that I could use that to decide whether or not to install Calibre, which then creates an ENV_VAR for book-services.js. I decided against committing the installer (not the install script) since the whole Calibre tgz file is over 100MB, which would bloat the repo a bit too much. For stability, there is the possibility of pegging the version of Calibre; please let me know if you would like me to do that.

The code should be relatively stable; I tested multiple permutations, such as Calibre vs kindlegen as switched by the ARG in the Dockerfile, and sending mail with and without STARTTLS (which is an added bonus I included here because I couldn't connect to my mail server without that). The apt-get update and install seems to throw a Broken Pipe error a few dozen times every time it runs, but it doesn't seem to actually affect anything: the installer continues, and everything seems to work.

Once again, thank you for epub-press! This was a bit of an adventure, but I definitely enjoyed it. I can't wait to see this running on the server; apart from the chapter titles now working, the content that can fit in MOBI files to be emailed is effectively doubled, so I can save on creating Part 1 of n books.