A command line utility that replaces appropriate spaces in a HTML document with instances of the character entity
.
The main target language is Czech.
- HTML-aware: ignores everything enclosed in angle brackets (namely HTML tags)
- conservative: only modifies the necessary characters
- configurable: choose the transducers or transducer groups you like
While I have tried my best to make the results of the tool correct and predictable, I still recommend checking the output by hand as some of the transducers may respond to false positives.
- requires Python 3.5
- the input and output file encoding is hardcoded to UTF‑8
- does not skip the content of the tag
<pre>
Call python3 nbspacer.py --help
.
To enable the Czech translation, follow these steps:
git submodule update --init
(pull the Czech translation of theargparse
module)./nbspacer-cs-msgfmt.sh
(compile the Czech translation ofnbspacer
andargparse
)
To enable the Czech translation at runtime, set the environment variable LANGUAGE
to the value cs
.
Most of the Czech language transducers are inspired by the relevant article in Internet Language Reference Book created by The Institute of the Czech Language of the Academy of Sciences of the Czech Republic. I would like to thank the authors of the reference book for compiling a clear and reasonable set of guidelines.
- Automatic NBSP (HTML, Wordpress plugin)
- &Nbsp; replacer (HTML, web interface)
- vlna (LaTeX)