aerkalov/ebooklib

Issue when editing a book into another one

fciannella opened this issue · 5 comments

My objective is to edit the xhtml files inside an epub book. I am doing the following just as a test (no editing involved):

  1. Create a new empty ebook, dest_book
  2. For each item in the source_book, add it to the dest_book
  3. Write the dest_book to an epub file

I assume that the above would produce an exact copy of the source book, but I see some issues, as in the Style Sheet information is not passed on to the dest_book, here is an example of a section from the two books

Section 29 of the source_book

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" />

  <title>Die unendliche Geschichte</title>
</head>

<body>
  <p class="text green">Beratungen, die das Wohl und Wehe ganz Phantásiens betrafen, wurden für gewöhnlich im großen Thronsaal des Elfenbeinturms abgehalten, der innerhalb des eigentlichen Palastbezirks nur wenige Stockwerke unter dem Magnolienpavillon lag.</p>

Section 29 of the dest_book

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/#" lang="de" xml:lang="de">
  <head/>
  <body><p class="text green">Beratungen, die das Wohl und Wehe ganz Phantásiens betrafen, wurden für gewöhnlich im großen Thronsaal des Elfenbeinturms abgehalten, der innerhalb des eigentlichen Palastbezirks nur wenige Stockwerke unter dem Magnolienpavillon lag.</p>&#13;
&#13;

Why is the style information removed in the destination book? I have just simply copied the items! Is there a way to retain that information in the dest_book?

Here is the list of commands I am using:

from ebooklib import epub
book = epub.read_epub("/workspace/src/epubimport/epubs/german/test_book.epub")
new_book = epub.EpubBook()
new_book.set_identifier('MichaelEnde01')
new_book.set_title('Die Unendliche Geshichte')
new_book.set_language('de')

new_book.add_author('Francesco Ciannella')
new_book.add_metadata('DC', 'description', 'This is description for my book')
new_book.add_metadata(None, 'meta', '', {'name': 'key', 'content': 'value'})

for _item in book.get_items():
    new_book.add_item(_item)

new_book.toc = book.toc
new_book.spine = book.spine

epub.write_epub('/workspace/src/epubimport/epubs/german/test.epub', new_book)

If I compare the Section content from python by getting the item by id everything is the same, but when I save the file and I open the xhtml file the content is different, the Style information is removed.

Hi Francesco!

The idea behind the library was not to do the inline manipulation but rather to do read + modify + cleanup + write new file. This was mainly because we always had to have 100% valid output and we were dealing with non valid input a lot of times.

In a way, what you need is a clone function. I started working on it to show as an example to someone who asked but never finished it. Will do it again and put it in the samples directory.

In short, what you did was a good start but there are some issues. New XHTML file is build on top of the new template and not the original XHTML source file. That can be changed. This also includes everything from . That has to be rebuild again (including the title). Mostly it is just linking to correct CSS files.

Aco

Hi Aleksandar!

Thanks so much for looking into this and for the explanation! Let me know once you have the time to create the sample! It's very appreciated.

My use case is not exactly cloning. I need to process the xhtml source, adding some metadata to some part of the book, and then create the new epub with the modifications incorporated, but all the rest of the book's style should stay the same.

Thanks a lot!
Francesco.

Hi guys,

I'm also facing the same problem. I wanted to try out Python with some scenario, in my case is to add things like description, series to few of my ebooks which are missing those. I've done that easily - but when trying to save the epub, I noticed that styles information are lost.

Also noticed in documentation that is not meant to do editing of existing ebook, I'm doing that similar to fciannel (BTW thanks, I see what else I can add to my code). I've been trying to find a way how I can link existing styles to my book, but I must admit that it's way to advanced for me yet.

Has anything updated?