Unicode to 8-bit charset transliteration codec. This package contains codecs for transliterating ISO 10646 texts into best-effort representations using smaller coded character sets (ASCII, ISO 8859, etc.). The translation tables used by the codecs are from the ``transtab`` collection by Markus Kuhn. Three types of transliterating codecs are provided: "long", using as many characters as needed to make a natural replacement. For example, \u00e4 LATIN SMALL LETTER A WITH DIAERESIS ``ä`` will be replaced with ``ae``. "short", using the minimum number of characters to make a replacement. For example, \u00e4 LATIN SMALL LETTER A WITH DIAERESIS ``ä`` will be replaced with ``a``. "one", only performing single character replacements. Characters that can not be transliterated with a single character are passed through unchanged. For example, \u2639 WHITE FROWNING FACE ``☹`` will be passed through unchanged. Using the codecs is simple:: >>> import translitcodec >>> import codecs >>> codecs.encode('fácil € ☺', 'translit/long') 'facil EUR :-)' >>> codecs.encode('fácil € ☺', 'translit/short') 'facil E :-)' The codecs return Unicode by default. To receive a bytestring back, either chain the output of encode() to another codec, or append the name of the desired byte encoding to the codec name:: >>> codecs.encode('fácil € ☺', 'translit/one').encode('ascii', 'replace') 'facil E ?' >>> 'fácil € ☺'.encode('translit/one/ascii', 'replace') 'facil E ?' The package also supplies a 'transliterate' codec, an alias for 'translit/long'. Another way to use the library is to use an error handle. Error handles are available: * 'strict/translit/long', 'strict/translit/short', 'strict/translit/one' - similar to 'strict' * 'ignore/translit/long', 'ignore/translit/short', 'ignore/translit/one' - similar to 'ignore' * 'replace/translit/long', 'replace/translit/short', 'replace/translit/one' - similar to 'replace' These error handles above, work similarly to Python's built-in ones. The difference is that transliteration is attempted first. >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/long').decode('ISO-8859-2') 'Zażółć gęślą jaźń EUR :-)?!@#' >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/short').decode('ISO-8859-2') 'Zażółć gęślą jaźń E :-)?!@#' >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/one').decode('ISO-8859-2') 'Zażółć gęślą jaźń E ??!@#' >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/long').decode('ISO-8859-2') 'Zażółć gęślą jaźń EUR :-)!@#' >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/short').decode('ISO-8859-2') 'Zażółć gęślą jaźń E :-)!@#' >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/one').decode('ISO-8859-2') 'Zażółć gęślą jaźń E !@#'