unicodeUtil
is a toolkit for handling Unicode code point properties. It provides various functions to test Unicode code point attributes, such as checking if a character is a number, converting case, and more. It supports multiple Unicode character sets and is suitable for internationalization, text processing, and character analysis.
โข ๐ Unicode Number Detection โ Supports multiple Unicode number representations.
โข โก Case Conversion โ Supports case conversion for various Unicode characters.
โข ๐ Punctuation Detection โ Supports detection of various Unicode punctuation marks, including common punctuation (e.g., periods, commas), special symbols (e.g., Em dash, ellipsis), and full-width punctuation.
โข ๐ Easy to Use โ Provides a simple API for quick integration.
โข โ
Extensively Tested โ Tested with a wide range of Unicode characters.
โข ๐ Open Source โ Actively maintained by the Moonbit community.
moon add kesmeey/unicodeUtil
unicodeUtil
offers a variety of functions to handle Unicode character properties. Below are examples of some commonly used features:
The is_number
function checks if a character is a number, supporting various Unicode number representations (e.g., Arabic numerals, Thai numerals, Chinese numerals).
fn main {
println(@lib.is_number('0')) // true (ASCII number)
println(@lib.is_number('ไธ')) // true (Chinese numeral)
println(@lib.is_number('ูฉ')) // true (Arabic numeral)
println(@lib.is_number('ใ')) // false (Japanese kana)
}
The is_letter
function checks if a character is a letter, supporting various Unicode letter representations.
fn main {
println(@lib.is_letter('A')) // true (English letter)
println(@lib.is_letter('0')) // false (ASCII non-letter)
println(@lib.is_letter('ะฏ')) // true (Cyrillic letter)
println(@lib.is_letter('ๅญ')) // true (Chinese character)
}
The to_lower
function converts uppercase characters to lowercase, supporting various Unicode characters.
fn main {
println(@lib.to_lower('A')) // 'a' (ASCII uppercase letter)
println(@lib.to_lower('ร')) // 'รฉ' (Latin1 uppercase letter)
println(@lib.to_lower('ฮ')) // 'ฮฑ' (Greek letter)
println(@lib.to_lower('ฤฐ')) // 'i' (Turkish character)
println(@lib.to_lower('โบ')) // 'โบ' (non-letter character, unchanged)
}
The to_upper
function converts lowercase characters to uppercase, supporting various Unicode characters.
fn main {
println(@lib.to_upper('a')) // 'A' (ASCII lowercase letter)
println(@lib.to_upper('รฉ')) // 'ร' (Latin1 lowercase letter)
println(@lib.to_upper('ฮฑ')) // 'ฮ' (Greek letter)
println(@lib.to_upper('ฤฑ')) // 'I' (Turkish character)
println(@lib.to_upper('ร')) // 'แบ' (German character)
println(@lib.to_upper('ๆผข')) // 'ๆผข' (non-letter character, unchanged)
}
The is_punct
function checks if a character is punctuation, supporting various Unicode punctuation marks.
fn main {
println(@lib.is_punct('โ')) // true (Em dash)
println(@lib.is_punct('โ ')) // true (Not equal sign)
println(@lib.is_punct('รฉ')) // false (non-punctuation character)
println(@lib.is_punct('๏ผ')) // true (full-width period)
println(@lib.is_punct('๏ผ')) // true (full-width question mark)
}
The is_mark
function checks if a character is a mark, supporting various Unicode marks.
fn main {
println(@lib.is_mark('\u0300')) // true (accent mark)
println(@lib.is_mark('\u0591')) // true (Hebrew mark)
println(@lib.is_mark('1')) // false (number)
println(@lib.is_mark('รค')) // false (composed character)
}
The is_control
function checks if a character is a control character, supporting various Unicode control characters.
fn main {
println(@lib.is_control('\u0008')) // true (BACKSPACE)
println(@lib.is_control('\u200D')) // true (ZERO WIDTH JOINER)
println(@lib.is_control('1')) // false (number)
println(@lib.is_control('\u2022')) // false (bullet symbol)
}
The is_space
function checks if a character is whitespace, supporting various Unicode whitespace characters.
fn main {
println(@lib.is_space(' ')) // true (space)
println(@lib.is_space('\u2000')) // true (half-width space)
println(@lib.is_space('6')) // false (number)
println(@lib.is_space('.')) // false (punctuation)
}
The is_symbol
function checks if a character is a symbol (e.g., mathematical symbols, currency symbols), supporting various Unicode symbols.
fn main {
println(@lib.is_symbol('$')) // true (dollar sign)
println(@lib.is_symbol('ยข')) // true (cent sign)
println(@lib.is_symbol('6')) // true (geometric symbol)
println(@lib.is_symbol('ๅญ')) // false (Chinese character)
}
The is_print
function checks if a character is printable (e.g., letters, Chinese characters), supporting various Unicode symbols.
fn main {
println(@lib.is_print('a')) // true (letter 'a')
println(@lib.is_print('.')) // true (punctuation '.')
println(@lib.is_print('ๆขฆ')) // true (Chinese character 'ๆขฆ')
println(@lib.is_print('\n')) // false (newline is not printable)
}
The is_graphic
function checks if a character is graphic (e.g., letters, Chinese characters), supporting various Unicode symbols.
fn main {
println(@lib.is_graphic('a')) // true (letter 'a')
println(@lib.is_graphic('\u0300')) // true (combining accent mark)
println(@lib.is_graphic('$')) // true (dollar sign)
println(@lib.is_graphic('u0000')) // false (NULL)
}
This project is licensed under the Apache-2.0 License. See LICENSE for details.
โข Moonbit Community: moonbit-community
โข GitHub Issues: Report an Issue
๐ If you like this project, give it a โญ! Happy coding! ๐