glyphIgo is a Swiss Army knife for dealing with fonts and EPUB eBooks
- Version: 3.0.3
- Date: 2015-06-07
- Developer: Alberto Pettarin (contact)
- License: the MIT License (MIT), see LICENSE.md
There are seven main usage scenarios:
- check whether a given font file contains all the glyphs needed to properly display the given EPUB or plain text file,
- convert a font file from/to TTF/OTF/WOFF format,
- count the number of characters in an EPUB file or a plain text UTF-8 file,
- list all Unicode characters used in an EPUB file or a plain text UTF-8 file or all Unicode glyphs present in a TTF/OTF/WOFF font file,
- lookup for information about a given Unicode character, including heuristic name matching,
- (de)obfuscate a font, with either the IDPF or the Adobe algorithm, and
- subset a given font file, that is, create a new font file containing only the subset of glyphs of a given font that are contained in a EPUB or plain text file.
Optionally, you can export a list of Unicode glyphs/characters, produced by the above commands, as an EPUB file for quick testing on an eReader.
- couple of misc. changes add by YSYoon (for personal use)
$ ./glyphIgo.py check|convert|count|list|lookup|obfuscate|subset [options]
optional arguments:
-h, --help show this help message and exit
--version print version and exit
-c CHARACTER, --character CHARACTER
lookup CHARACTER, specified as name, partial name,
dec/hex codepoint, or Unicode character
-d DECODE, --decode DECODE
use DECODE encoding to decode the input EBOOK or PLAIN
file
-e EBOOK, --ebook EBOOK
ebook file, in EPUB/ZIP format
-f FONT, --font FONT font file, in TTF/OTF/WOFF format
-g GLYPHS, --glyphs GLYPHS
font file, specified as a list of decimal Unicode
codepoints contained in plain text file GLYPHS, one
codepoint per line
-i ID, --id ID (de)obfuscate FONT using ID to compute the obfuscation
key
-o OUTPUT, --output OUTPUT
create OUTPUT file
-p PLAIN, --plain PLAIN
ebook file, in plain text format
-r RANGE, --range RANGE
range, in '0x????-0x????' or '????-????' format
-q, --quiet quiet output
-s, --sort sort output by character count instead of character
codepoint
-u, --epub output an EPUB file containing the Unicode characters
in the input file(s)
-v, --verbose verbose output
-w, --nohumanreadable
verbose output without human readable messages
-x, --glyphsonly list output with glyphs only (added by YSYoon)
--adobe use Adobe obfuscation algorithm
--blocks print range and name of Unicode blocks
--compact compact lookup output (Unicode character, name, and
codepoint only)
--exact use exact Unicode lookup (default)
--exclude exclude the characters in EBOOK or PLAIN from the
output
--full full lookup output (default)
--heuristic use heuristic Unicode lookup
--idpf use IDPF obfuscation algorithm (default)
--preserve preserve X(HT)ML tags instead of stripping them away
exit codes:
0 = no error
1 = RESERVED
2 = invalid command line argument(s)
4 = missing glyphs in the font file to correctly display the given ebook or file
8 = failure while executing the requested command
1. Print this usage message
$ ./glyphIgo.py -h
2. Check whether all the characters in ebook.epub can be displayed by font.ttf
$ ./glyphIgo.py check -f font.ttf -e ebook.epub
3. As above, but use font_glyph_list.txt containing a list of decimal codepoints for the font glyphs
$ ./glyphIgo.py check -g font_glyph_list.txt -e ebook.epub
4. As above, but sort missing characters (if any) by their count (in ebook.epub) instead of by Unicode codepoint
$ ./glyphIgo.py check -f font.ttf -e ebook.epub -s
5. As above, but also create missing.epub containing the list of missing Unicode characters
$ ./glyphIgo.py check -f font.ttf -e ebook.epub -u -o missing.epub
6. Convert font.ttf (TTF) into font.otf (OTF)
$ ./glyphIgo.py convert -f font.ttf -o font.otf
7. Count the number of characters in ebook.epub
$ ./glyphIgo.py count -e ebook.epub
8. As above, but preserve tags
$ ./glyphIgo.py count -e ebook.epub --preserve
9. Print the list of glyphs in font.ttf
$ ./glyphIgo.py list -f font.ttf
10. As above, but just output the decimal codepoints
$ ./glyphIgo.py list -f font.ttf -q
11. Print the list of characters in ebook.epub
$ ./glyphIgo.py list -e ebook.epub
12. As above, but also create list.epub containing the list of Unicode characters
$ ./glyphIgo.py list -e ebook.epub -u -o list.epub
13. Print the list of characters in page.xhtml
$ ./glyphIgo.py list -p page.xhtml
14. Print the list of characters in the range 0x2200-0x22ff (Mathematical Operators)
$ ./glyphIgo.py list -r 0x2200-0x22ff
$ ./glyphIgo.py list -r "Mathematical Operators"
15. Print the range and name of Unicode blocks
$ ./glyphIgo.py list --blocks
16. Lookup for information for Unicode character
$ ./glyphIgo.py lookup -c 8253
$ ./glyphIgo.py lookup -c 0x203d
$ ./glyphIgo.py lookup -c ‽
$ ./glyphIgo.py lookup -c "INTERROBANG"
17. As above, but print compact output
$ ./glyphIgo.py lookup --compact -c 8253
$ ./glyphIgo.py lookup --compact -c 0x203d
$ ./glyphIgo.py lookup --compact -c ‽
$ ./glyphIgo.py lookup --compact -c "INTERROBANG"
18. Heuristic lookup for information for Unicode characters which are Greek omega letters with oxia
$ ./glyphIgo.py lookup --heuristic -c "GREEK OMEGA OXIA"
19. (De)obfuscate font.otf into obf.font.otf using the given id and the IDPF algorithm
$ ./glyphIgo.py obfuscate -f font.otf -i "urn:uuid:9a0ca9ab-9e33-4181-b2a3-e7f2ceb8e9bd" -o obf.font.otf
20. As above, but use Adobe algorithm
$ ./glyphIgo.py obfuscate -f font.otf -i "urn:uuid:9a0ca9ab-9e33-4181-b2a3-e7f2ceb8e9bd" -o obf.font.otf --adobe
21. Subset font.ttf into min.font.otf by copying only the glyphs appearing in ebook.epub
$ ./glyphIgo.py subset -f font.ttf -e ebook.epub -o min.font.otf
22. Subset font.ttf into rem.font.ttf by removing the glyphs appearing in list.txt
$ glyphIgo.py subset -f font.ttf -p list.txt -o rem.font.ttf --exclude
23. Make a file with list of glyph names and glyph names only (added by YSYoon)
$ ./glyphIgo.py list -f font.ttf -x > list.txt
Please see OUTPUT.md for usage examples with their actual output.
glyphIgo is released under the MIT License since version 2.0.0 (2014-03-07).
Previous versions, hosted in a Google Code repo, were released under the GNU GPL 3 License.
glyphIgo uses argcomplete
for autocompleting options/filenames.
Please refer to the argcomplete
documentation
for directions on how to enable it.
glyphIgo requires Python 2.7 (or later Python 2.x), and Python modules:
python-fontforge
,python-htmlentitydefs
, andpython-unicodedata
.
For the sake of speed and code clarity, the given EPUB is not "fully parsed". In particular:
- the list of Unicode characters is extracted by inspecting all files inside the ZIP archive whose lowercased name ends in
xhtml
,html
, andxml
(except those inMETA-INF/
, which are skipped), and - the book pages are not parsed (e.g., a Unicode character appearing inside a comment will be accounted for).
Please observe that these approximations err on the "conservative" side, possibly generating "false-positives" but never generating "false-negatives".
You can also pass a ZIP archive, containing several XHTML/HTML/XML pages, using the -e
switch.
By default, glyphIgo assumes that all files are encoded in UTF-8.
You can change the encoding used while decoding plain text files
by specifying the -d
(or --decode
) parameter.
Conversion from entity (named or not) to Unicode codepoint is supported.
Unfortunately, there is no python-fontforge
module for Python 3 in the stable Debian repo (as of 2014-03-07), so you must use Python 2.7 (or later Python 2.x) to run glyphIgo.
To use -u
or --epub
switch, you also need to download genEPUB.py
and put it into the same directory of glyphIgo.py
.
- Support for Unicode modifiers
- Full EPUB parsing
- Font obfuscation: parse the uid directly from a given EPUB
- Support for autocompleting via
argcomplete
- Shortcuts (e.g.,
"-C" == "count -e"
)
Most people think that glyphIgo = "glyph I go"
.
Instead, the name comes from glyph
and figo
(Italian slang for cool
).
I needed to perform the "font checking" on nearly 100,000 EPUB files at once, for a large project. Then, I felt bad having this little piece of code sitting idly, so I decided to publish it on Google Code. In March 2014, I moved it to GitHub.