CLI to transform/reformat ePub (or any HTML) content.
Tasks are defined in a JSON file, and passed to epub-clean command-line using the required -c/--config
option.
Example config JSON:
[
{
"name": "Remove span tags",
"task": "remove-elements",
"selector": "span",
"args": [
"keep-contents"
]
}
]
See test-config.json for more examples
Example running:
node epub-clean ./OPS/epub-chapter1.xhtml -c clean-config.json
name
: a unique name for the task being configuredtask
: name of task type (see options below)selector
: a CSS selector string to select which nodes to perform the task on. Anything supported bydocument.querySelectorAll
is supported (see querySelectorAll (MDN))args
: Array of arguments required by the task type
Args:
- Array of attribute modification objects:
op
: one of (add
|remove
|replace
|regex
)attribute
: Any attribute whose value can be obtained by callingelement.getAttribute
(see MDN Docs)value
:- for
add
:<new attribute value>
- for
replace
:<new attribute value>
- for
regex
:[ <regex string>, <replacement string> ]
- for
remove
: no argument supplied
- for
Example:
{
"name": "Remove calibre classes",
"task": "amend-attrs",
"selector": "[class^=\"calibre\"]",
"args": [
{
"op": "regex",
"attribute": "class",
"value": [ "\\s?calibre[\\d]\\s?", "" ]
}
]
}
Args:
- one of (
title-case
|lower-case
|upper-case
)
Example:
[
{
"name": "Title-case headings",
"task": "change-case",
"selector": "h1, h2, h3, h4",
"args": [
"title-case"
]
}
]
Converts elements/classes from one type to another.
By default all CSS classes are removed.
Preserve the non-matching CSS classes by adding the namespace other
.
Example:
[
{ "div.chp": "section.chapter|other" },
]
Produces:
<div class="chp ch1 ch-title">
<!-- becomes -->
<section class="chapter ch1 ch-title">
Preserve all original CSS classes by adding the namespace all
.
Example:
[
{ "div.chp": "section.chapter|all" },
]
Produces:
<div class="chp ch1 ch-title">
<!-- becomes -->
<section class="chapter chp ch1 ch-title">
[
{
"div.chapter": "section[epub:type=\"chapter\"]",
"p.h2": "h2.chapter-title",
"p.h3": "h3",
"p.tx": "p",
"span.small": "small"
}
]
Produces:
<div class="chapter"> → <section epub:type="chapter"> <!-- todo: -->
<p class="h2"> → <h2 class="chapter-title">
<p class="h3"> → <h3>
<p class="tx"> → <p>
<span class="small"> → <small>
Args:
- One of:(
keep-content
|discard-content
)
keep-content
will copy the element's innerHTML to its parent node
Example:
{
"name": "Remove extra <span>",
"task": "remove-elements",
"selector": "span:not([class]):not([id])",
"args": [ "keep-content" ]
}