A parser for files in the Unicode database. Produces a giant array of codepoint objects for every character represented by Unicode, with many properties derived from files in the Unicode database.
BUILD SCRIPTS ONLY: Use in production is not recommended as the parsers are not optimized for speed, the text files are huge, and the resulting array uses a huge amount of memory. To access this data in real world applications, use modules that have precompiled the data into a compressed form:
Install using npm:
npm install codepoints
Basic usage:
codepoints = require('codepoints');The parser generates data by reading the text files contained in the
Unicode Character Database. By default, it will use the database
bundled with this package. To use a custom version of UCD, use codepoints/parser instead,
which accepts an optional path to a directory containing the uncompressed UCD data:
parser = require('codepoints/parser');
codepoints = parser('/path/to/UCD');Each element in the generated array is either undefined (for unassigned code
points), or an object containing the following properties:
code- the code point indexname- character nameunicode1Name- legacy name used by Unicode 1category- Unicode categoryblock- the block name this character is a part ofscript- the script this character belongs toeastAsianWidth- the east asian width for this charactercombiningClass- numeric combining class valuecombiningClassName- a string name for the combining classbidiClass- class for the Unicode bidirectional algorithmbidiMirrored- whether the character is mirrored in the bidi algorithmnumeric- the numeric value for this characteruppercase- an array of code points mapping this character to upper case, if anylowercase- an array of code points mapping this character to lower case, if anytitlecase- an array of code points mapping this character to title case, if anyfolded- an array of code points mapping this character to a folded equivalent, if anycaseConditions- conditions used during case mapping for this characterdecomposition- an array of code points that this character decomposes into. Used by the Unicode normalization algorithm.compositions- a dictionary mapping of compositions for this characterisCompat- whether the decomposition is a compatibility oneisExcluded- whether the character is excluded from compositionNFC_QC- quickcheck value for NFC (0 = YES, 1 = NO, 2 = MAYBE)NFKC_QC- quickcheck value for NFKC (0 = YES, 1 = NO, 2 = MAYBE)NFD_QC- quickcheck value for NFD (0 = YES, 1 = NO)NFKD_QC- quickcheck value for NFKD (0 = YES, 1 = NO)joiningType- arabic joining typejoiningGroup- arabic joining group
MIT