/unicodeUtil

unicodeUtil is a toolkit for handling Unicode code point attributes. It provides a variety of functions to test the attributes of Unicode code points.

Primary LanguageMoonBitApache License 2.0Apache-2.0


๐Ÿฑ unicodeUtil: Unicode Code Point Property Toolkit

English | ็ฎ€ไฝ“ไธญๆ–‡

Build Status License codecov

unicodeUtil is a toolkit for handling Unicode code point properties. It provides various functions to test Unicode code point attributes, such as checking if a character is a number, converting case, and more. It supports multiple Unicode character sets and is suitable for internationalization, text processing, and character analysis.


๐Ÿš€ Key Features

โ€ข ๐Ÿ” Unicode Number Detection โ€“ Supports multiple Unicode number representations.
โ€ข โšก Case Conversion โ€“ Supports case conversion for various Unicode characters.
โ€ข ๐Ÿ“ Punctuation Detection โ€“ Supports detection of various Unicode punctuation marks, including common punctuation (e.g., periods, commas), special symbols (e.g., Em dash, ellipsis), and full-width punctuation.
โ€ข ๐Ÿ›  Easy to Use โ€“ Provides a simple API for quick integration.
โ€ข โœ… Extensively Tested โ€“ Tested with a wide range of Unicode characters.
โ€ข ๐Ÿ”„ Open Source โ€“ Actively maintained by the Moonbit community.


๐Ÿ“ฅ Installation

moon add kesmeey/unicodeUtil

๐Ÿš€ Usage Guide

unicodeUtil offers a variety of functions to handle Unicode character properties. Below are examples of some commonly used features:


๐Ÿ” Check if a Character is a Number

The is_number function checks if a character is a number, supporting various Unicode number representations (e.g., Arabic numerals, Thai numerals, Chinese numerals).

fn main {
  println(@lib.is_number('0')) // true  (ASCII number)
  println(@lib.is_number('ไธ€')) // true  (Chinese numeral)
  println(@lib.is_number('ูฉ')) // true  (Arabic numeral)
  println(@lib.is_number('ใ‹')) // false (Japanese kana)
}

๐Ÿ” Check if a Character is a Letter

The is_letter function checks if a character is a letter, supporting various Unicode letter representations.

fn main {
  println(@lib.is_letter('A')) // true  (English letter)
  println(@lib.is_letter('0')) // false (ASCII non-letter)
  println(@lib.is_letter('ะฏ')) // true  (Cyrillic letter)
  println(@lib.is_letter('ๅญ—')) // true  (Chinese character)
}

โšก Convert to Lowercase

The to_lower function converts uppercase characters to lowercase, supporting various Unicode characters.

fn main {
  println(@lib.to_lower('A')) // 'a'  (ASCII uppercase letter)
  println(@lib.to_lower('ร‰')) // 'รฉ'  (Latin1 uppercase letter)
  println(@lib.to_lower('ฮ‘')) // 'ฮฑ'  (Greek letter)
  println(@lib.to_lower('ฤฐ')) // 'i'  (Turkish character)
  println(@lib.to_lower('โ˜บ')) // 'โ˜บ' (non-letter character, unchanged)
}

โšก Convert to Uppercase

The to_upper function converts lowercase characters to uppercase, supporting various Unicode characters.

fn main {
  println(@lib.to_upper('a')) // 'A'  (ASCII lowercase letter)
  println(@lib.to_upper('รฉ')) // 'ร‰'  (Latin1 lowercase letter)
  println(@lib.to_upper('ฮฑ')) // 'ฮ‘'  (Greek letter)
  println(@lib.to_upper('ฤฑ')) // 'I'  (Turkish character)
  println(@lib.to_upper('รŸ')) // 'แบž'  (German character)
  println(@lib.to_upper('ๆผข')) // 'ๆผข' (non-letter character, unchanged)
}

๐Ÿ“ Check if a Character is Punctuation

The is_punct function checks if a character is punctuation, supporting various Unicode punctuation marks.

fn main {
  println(@lib.is_punct('โ€”')) // true  (Em dash)
  println(@lib.is_punct('โ‰ ')) // true  (Not equal sign)
  println(@lib.is_punct('รฉ')) // false (non-punctuation character)
  println(@lib.is_punct('๏ผŽ')) // true  (full-width period)
  println(@lib.is_punct('๏ผŸ')) // true  (full-width question mark)
}

๐Ÿ“ Check if a Character is a Mark

The is_mark function checks if a character is a mark, supporting various Unicode marks.

fn main {
  println(@lib.is_mark('\u0300')) // true  (accent mark)
  println(@lib.is_mark('\u0591')) // true  (Hebrew mark)
  println(@lib.is_mark('1'))      // false (number)
  println(@lib.is_mark('รค'))      // false (composed character)
}

๐Ÿ“ Check if a Character is a Control Character

The is_control function checks if a character is a control character, supporting various Unicode control characters.

fn main {
  println(@lib.is_control('\u0008')) // true  (BACKSPACE)
  println(@lib.is_control('\u200D')) // true  (ZERO WIDTH JOINER)
  println(@lib.is_control('1'))      // false (number)
  println(@lib.is_control('\u2022')) // false (bullet symbol)
}

๐Ÿ“ Check if a Character is Whitespace

The is_space function checks if a character is whitespace, supporting various Unicode whitespace characters.

fn main {
  println(@lib.is_space(' '))       // true  (space)
  println(@lib.is_space('\u2000')) // true  (half-width space)
  println(@lib.is_space('6'))       // false (number)
  println(@lib.is_space('.'))       // false (punctuation)
}

๐Ÿ“ Check if a Character is a Symbol

The is_symbol function checks if a character is a symbol (e.g., mathematical symbols, currency symbols), supporting various Unicode symbols.

fn main {
  println(@lib.is_symbol('$')) // true  (dollar sign)
  println(@lib.is_symbol('ยข')) // true  (cent sign)
  println(@lib.is_symbol('6')) // true  (geometric symbol)
  println(@lib.is_symbol('ๅญ—')) // false (Chinese character)
}

๐Ÿ“ Check if a Character is Printable

The is_print function checks if a character is printable (e.g., letters, Chinese characters), supporting various Unicode symbols.

fn main {
  println(@lib.is_print('a')) // true  (letter 'a')
  println(@lib.is_print('.')) // true  (punctuation '.')
  println(@lib.is_print('ๆขฆ')) // true  (Chinese character 'ๆขฆ')
  println(@lib.is_print('\n')) // false (newline is not printable)
}

๐Ÿ“ Check if a Character is Graphic

The is_graphic function checks if a character is graphic (e.g., letters, Chinese characters), supporting various Unicode symbols.

fn main {
  println(@lib.is_graphic('a')) // true  (letter 'a')
  println(@lib.is_graphic('\u0300')) // true  (combining accent mark)
  println(@lib.is_graphic('$')) // true  (dollar sign)
  println(@lib.is_graphic('u0000')) // false (NULL)
}

๐Ÿ“œ License

This project is licensed under the Apache-2.0 License. See LICENSE for details.


๐Ÿ“ข Contact & Support

โ€ข Moonbit Community: moonbit-community
โ€ข GitHub Issues: Report an Issue

๐Ÿ‘‹ If you like this project, give it a โญ! Happy coding! ๐Ÿš€