Exploded EPUBs and OPDS
rkwright opened this issue · 4 comments
One of the gotchas of using a web-based browser solution, especially if the content is on a different server, is that if the content is staged as a zipped EPUB then the resources must be fetched before they can be used. The key is that since they are compressed and packaged in the EPUB they have to be fetched first - they cannot be streamed since the browser engine doesn’t know how to extract content from a zip or de-obfuscate fonts, etc.
A Readium document detailing this can be found here.
The solution to this problem is explode the EPUB on the server, deobfuscate the fonts and so on so that the content can be streamed using the browser engine’s native capabilities. However, OPDS does not currently support “items” which are unpackaged EPUBs. Readium therefore implemented support for OPDS which supports a new mimetype application/epub
(no +zip
).
For example here are two items from one of our feeds. The first is zipped, the second is not:
<entry>
<title>Tiny Three.js Loader - zipped</title>
<author>
<name>Ric Wright</name>
</author>
<link type="image/jpeg" href="epub_threejs_logo.jpg" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/epub+zip" href="Tiny-Loader.epub" rel="http://opds-spec.org/acquisition"/>
<updated>2017-11-01T00:00:00Z</updated>
<id>READIUM_OPDS_0123456789_20</id>
</entry>
<entry>
<title>Tiny Three.js Loader - unzipped</title>
<author>
<name>Ric Wright</name>
</author>
<link type="image/jpeg" href="epub_threejs_logo.jpg" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/epub" href="Tiny-Loader" rel="http://opds-spec.org/acquisition"/>
<updated>2017-11-01T00:00:00Z</updated>
<id>READIUM_OPDS_0123456789_21</id>
</entry>
We can of course simply continue to support our non-standard extension to OPDS/mimetype, but it seems like it might make sense to socialize the idea of getting an “unpacked” EPUB mimetype supported. Seems like something similar is going to be needed for WP in any case.
... but it seems like it might make sense to socialize the idea of getting an “unpacked” EPUB mimetype supported. Seems like something similar is going to be needed for WP in any case.
I don't think that's going to be the case, for WP it's not even clear if the WP manifest will get its own media type.
We could push this idea forward in the EPUB CG, but I have very little trust that it'll be accepted.
IMO, instead of using a hack (a tweaked version of the EPUB media type), we should directly reference the Readium Web Publication Manifest instead, which has its own media type (application/webpub+json
).
In your example, what does the link to application/epub
actually points to? Do you get an OPF in return? Another file? Using the media type for such files would be more appropriate than minting a new media type.
Yes, I believe it returns the OPF. The link points to the root of the exploded EPUB. I understand your argument and don't disagree with it, but are you suggesting that the website maintainer has to then generate a WPM for each book they wish to deploy? That might get some resistance from the less-adept site owners - unless we provided a tool that would reliably scan an EPUB and generate a WPM. Entirely doable, in fact, we could easily write a tool that would unpack the EPUB, generate the WPM and de-obfuscate the fonts.
@danielweck - thoughts?
If the OPDS entry references an OPF, it's better to reference the OPF media-type as well: application/oebps-package+xml
.
I understand your argument and don't disagree with it, but are you suggesting that the website maintainer has to then generate a WPM for each book they wish to deploy? That might get some resistance from the less-adept site owners - unless we provided a tool that would reliably scan an EPUB and generate a WPM. Entirely doable, in fact, we could easily write a tool that would unpack the EPUB, generate the WPM and de-obfuscate the fonts.
That's exactly what NYPL did. They used the Go streamer code and created a script that generates a static manifest with its associated resources: https://github.com/NYPL-Simplified/webpub-exporter
Entirely doable, in fact, we could easily write a tool that would unpack the EPUB, generate the WPM and de-obfuscate the fonts.
@danielweck - thoughts?
r2-shared-js
contains a command line utility for this use-case:
https://github.com/readium/r2-shared-js/blob/develop/src/_utils/cli.ts
Instructions:
https://github.com/readium/r2-shared-js/blob/develop/README.md#developer-quick-start