Run (simple) manipulations on HTML files, like extracting parts. Use CSS selectors to define which parts of the HTML to operator on, use different commands in pipes to perform the desired operations.
The basic syntax is:
COMMAND{ SELECTOR } | COMMAND{ SELECTOR }
Some COMMAND
use sub-pipelines. There are two kind of COMMANDS
with this:
-
"iterate"/"forEach": For each (sub) node matching the inner selector the sub-pipeline is processed, but the elements themselves are not changed
COMMAND{ SELECTOR ↦ COMMAND{ SELECTOR } | COMMAND { SELECTOR } }
COMMAND{ SELECTOR => COMMAND{ SELECTOR } | COMMAND { SELECTOR } }
-
"map"/"replace": For each (sub) node matching the inner selector the sub-pipeline is processed, and the pipelines result is used instead of the element
COMMAND{ SELECTOR ↤ COMMAND{ SELECTOR } | COMMAND { SELECTOR } }
COMMAND{ SELECTOR <= COMMAND{ SELECTOR } | COMMAND { SELECTOR } }
The SELECTOR
is a CSS selector.
There are three types of pipelines:
- element processing pipeline: The default. You have some input HTML which you run through the pipeline
- element creating sub-pipeline: special sub-pipeline wherever a commands adds one or more elements into the HTML tree (or into a different place of said tree)
- string value creating sub-pipeline: special sub-pipeline wherever a commands set a string value (text content, comment, attribute value)
Currently supported element processing commands:
EXTRACT-ELEMENT
: remove everything not matching the CSS selector (alias:ONLY
)REMOVE-ELEMENT
: remove everything matching the CSS selector (alias:WITHOUT
)FOR-EACH
: run a sub-pipeline on all sub-elements matching a CSS selector but return the previously selected elements (alias:WITH
)REPLACE-ELEMENT
: replace all elements matching a CSS selector with new elements (alias:MAP
)CLEAR-ATTR
: removes a given attribute from the previously selected elementsSET-ATTR
: Sets a given attribute to a specified valueCLEAR-CONTENT
: clears all children from the previously selected elementsSET-TEXT-CONTENT
: removes previous children and replaces it with exactly one given text childAPPEND-TEXT-CONTENT
: appends a new text childPREPEND-TEXT-CONTENT
: prepends a new text childAPPEND-COMMENT
: appends a new comment childPREPEND-COMMENT
: prepends a new comment childAPPEND-ELEMENT
: appends a new tag/element childPREPEND-ELEMENT
: prepends a new tag/element child
Currently supported element creating commands:
CREATE-ELEMENT
: creates a new, empty element, mainly in combination withAPPEND-ELEMENT
,PREPEND-ELEMENT
orREPLACE-ELEMENT
(alias:NEW
)LOAD-FILE
: reads a DOM from a different file, mainly in combination withAPPEND-ELEMENT
,PREPEND-ELEMENT
orREPLACE-ELEMENT
(alias:SOURCE
)QUERY-REPLACED
: returns children matching the CSS selector of those elements meant to be replaced, only combination with orREPLACE-ELEMENT
(alias:KEEP
)
Currently supported string-value creating commands:
USE-ELEMENT
: returns the currently selected element for a sub-pipeline, mainly in combination with "string value producing pipelines" (alias:THIS
)USE-PARENT
: returns the parent of the currently selected element for a sub-pipeline, mainly in combination with "string value producing pipelines" (alias:PARENT
)QUERY-ELEMENT
: runs a query on the currently selected element for a sub-pipeline, without detaching target element from HTML tree unlikeEXTRACT-ELEMENT
QUERY-PARENT
: runs a query on the parent of the currently selected element for a sub-pipelineQUERY-ROOT
: runs a query on the root of the currently selected element for a sub-pipelineGET-ATTR
: returns the value of an attribute of the currently selected element for a string-value producing pipelinesGET-TEXT-CONTENT
: returns the text content of the currently selected element for a string-value producing pipelinesREGEX-REPLACE
: runs a RegEx-based value replacements on the current string value of the pipelineTO-LOWER
: all-lower the current string value of the pipelineTO-UPPER
: all-caps the current string value of the pipelineADD-PREFIX
: add a given string to the beginning of the string value of the pipelineADD-SUFFIX
: add a given string to the end of the string value of the pipeline
The binary is called hse
and supports following options:
USAGE:
hse [OPTIONS] <PIPELINE>
ARGS:
<PIPELINE> Single string with the command pipeline to perform. If it starts with an @ the rest is treated as file name to read the pipeline definition from
OPTIONS:
-h, --help Print help information
-i, --input <input> File name of the Input. `-` for stdin (default)
-o, --output <output> File name of the Output. `-` for stdout (default)
-V, --version Print version information
# fetches all elements with CSS class "content" inside a <main> element
hse -i index.html 'ONLY{main .content}'
# fetches the `<main>` or element with CSS class `main`, but without any `<script>` defined inside
hse -i index.html 'ONLY{main, .main} | WITHOUT{script}'
# replaces all elements with `placeholder` class with the <div class="content"> from a second HTML file
hse -i index.html 'MAP{.placeholder ↤ SOURCE{"other.html"} | ONLY{div.content} }'
# add a new <meta name="version" value=""> element to <head> with git version info
hse -i index.html "WITH{head ↦ APPEND-ELEMENT{ NEW{meta} | SET-ATTR{name ↤ 'version'} | SET-ATTR{content ↤ '`git describe --tags`'} } }"
# add a new comment to <body> with git version info
hse -i index.html "WITH{body ↦ APPEND-COMMENT{'`git describe --tags`'}}"
# add an RDF <meta name="dc:title"> with same content as <title>
hse -i input.html "WITH{head ↦ APPEND-ELEMENT{ NEW{meta} | SET-ATTR{name ↤ 'dc:title' } } | WITH{meta[name='dc:title'] ↦ SET-ATTR{content ↤ QUERY-PARENT{title} | GET-TEXT-CONTENT } } }"
# replace non-word characters with an underscore in an attribute
hse -i index.html "EXTRACT-ELEMENT{#target} | SET-ATTR{data-test ↤ USE-ELEMENT | GET-ATTR{data-test} | REGEX-REPLACE{'\\W' ↤ '_'} }"
# run the pipeline defined in file `file.hsp` on content of `index.html`
hse -i index.html @file.hsp