Add Unicode UCDXML data source
Opened this issue · 0 comments
behnam commented
Source
- https://www.unicode.org/Public/12.0.0/ucdxml/
- https://www.unicode.org/Public/11.0.0/ucdxml/
- https://www.unicode.org/Public//ucdxml/
License
UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
https://www.unicode.org/license.html
Open Questions
- Should we set up only one repo with the
all
(complete UCD) set, or set up addition one or two fornounihan
and/orunihan
ones? - Do we need to include both
grouped
andflat
files, or one is enough in the repo? If both, maybe they belong to two separate repos?
Other Notes
From https://www.unicode.org/Public/12.0.0/ucdxml/ucdxml.readme.txt:
While every effort has been made to ensure consistency of the
XML representation with the UCD files, there may be some errors;
the UCD files are authoritative.
There are six files, available in zip/jar format; the size is that of
the archive:
flat grouped
no Unihan data 897 KB 556 KB
Unihan data only 5,855 KB 5,862 KB
complete UCD 7,657 KB 6,420 KB
The flat versions do not use the group mechanism. The grouped versions
use the group mechanism, with groups corresponding approximately to
the blocks (a few blocks have been subdivided).
The "no Unihan data" files do not contain the properties expressed only
in the Unihan database. The "Unihan data only" files contain only
the properties and code points expressed in the Unihan database.
The "complete UCD" files reflect the complete UCD data.```