Python Unicode Data Localization

This project aims to generate a Python module which provides translations for the Unicode descriptions found in the unicodedata module. The source of the translations is unicode-table.com which has its source code at GitHub. From this, PO and MO files are generated by this project.

Note: these are also useful for other programming languages. An overview of supported language can be found here.

This localization has been discussed in:

Prerequisits

Install the following packages

sudo apt-get install wget unzip python3 gettext

Generating

In order to generate the files needed for a Python module with translations of Unicode descriptions, run

./1-clean.sh

which will remove previous generations. Then run

./2-download.sh

to download the translations in master.zip. These are unzipped with

./3-extract.sh

into the directory unicode-table-data-master. The Python script

./4-generate.py

will generate PO files in a tree in the directory locale, such as

This script will also write log messages on information, warnings and errors to the command line. Note that languages are skipped if less than 1% has been translated or 10% of the translations identical to the original text.

Also, warnings are show when source texts are identical. This happens for <Control> and many ideographs and needs to be looked at further as the source texts need to be unique for PO files.

The PO files can be converted to MO files by running

./5-convert.sh

This results in the following files in the directory locale

  • cn
    • LC_MESSAGES
      • symbols.po
      • symbols.mo
  • de
    • LC_MESSAGES
      • symbols.po
      • symbols.mo
  • fr
    • LC_MESSAGES
      • symbols.po
      • symbols.mo
  • ...

Distribution

The files in locale can be packaged and distributed via e.g. PyPI or eventually become part of the Python distribution. Note that this localization can also be used for other programming languages.

Copyright

The copyright of the translated strings can be found at unicode-table.com. The copyright of the scripts here is public domain.