/icupy

Python bindings for ICU4C - ICU for C/C++

Primary LanguageC++MIT LicenseMIT

icupy

PyPI PyPI - Python Version icu PyPI - License pre-commit.ci status tests build wheels codecov

Python bindings for ICU4C using pybind11.

Changes from ICU4C

  • Naming Conventions

    Renamed functions, methods, and C++ enumerators to conform to PEP 8.

    • Function Names: Use lower_case_with_underscores style.
    • Method Names: Use lower_case_with_underscores style. Also, use one leading underscore only for protected methods.
    • C++ Enumerators: Use UPPER_CASE_WITH_UNDERSCORES style without a leading "k". (e.g., kDateOffsetDATE_OFFSET)
    • APIs that match Python reserved words: e.g.,
      • with()with_()
  • Error Handling

    • Unlike the C/C++ APIs, icupy raises the icupy.icu.ICUError exception if an error code indicates a failure instead of receiving an error code UErrorCode.

      You can access the icu::ErrorCode object from ICUError.args[0]. For example:

      from icupy import icu
      try:
          ...
      except icu.ICUError as e:
          print(e.args[0])  # → icupy.icu.ErrorCode
          print(e.args[0].get())  # → icupy.icu.UErrorCode

Examples

  • icu::UnicodeString with error callback

    from icupy import icu
    cnv = icu.ucnv_open('utf-8')
    action = icu.UCNV_TO_U_CALLBACK_ESCAPE
    context = icu.ConstVoidPtr(icu.UCNV_ESCAPE_C)
    icu.ucnv_set_to_ucall_back(cnv, action, context)
    utf8 = b'\x61\xfe\x62'  # Impossible bytes
    s = icu.UnicodeString(utf8, -1, cnv)
    str(s)  # → 'a\\xFEb'
    
    action = icu.UCNV_TO_U_CALLBACK_ESCAPE
    context = icu.ConstVoidPtr(icu.UCNV_ESCAPE_XML_DEC)
    icu.ucnv_set_to_ucall_back(cnv, action, context)
    s = icu.UnicodeString(utf8, -1, cnv)
    str(s)  # → 'aþb'
  • icu::UnicodeString with user callback

    from icupy import icu
    def _to_callback(
        _context: object,
        _args: icu.UConverterToUnicodeArgs,
        _code_units: bytes,
        _length: int,
        _reason: icu.UConverterCallbackReason,
        _error_code: icu.UErrorCode,
    ) -> icu.UErrorCode:
        if _reason == icu.UCNV_ILLEGAL:
            _source = ''.join(['%{:02X}'.format(x) for x in _code_units])
            icu.ucnv_cb_to_uwrite_uchars(_args, _source, len(_source), 0)
            _error_code = icu.U_ZERO_ERROR
        return _error_code
    
    cnv = icu.ucnv_open('utf-8')
    action = icu.UConverterToUCallbackPtr(_to_callback)
    context = icu.ConstVoidPtr(None)
    icu.ucnv_set_to_ucall_back(cnv, action, context)
    utf8 = b'\x61\xfe\x62'  # Impossible bytes
    s = icu.UnicodeString(utf8, -1, cnv)
    str(s)  # → 'a%FEb'
  • icu::DateFormat

    from icupy import icu
    tz = icu.TimeZone.create_time_zone('America/Los_Angeles')
    fmt = icu.DateFormat.create_instance_for_skeleton('yMMMMd', icu.Locale.get_english())
    fmt.set_time_zone(tz)
    dest = icu.UnicodeString()
    s = fmt.format(0, dest)
    str(s)  # → 'December 31, 1969'
  • icu::MessageFormat

    from icupy import icu
    fmt = icu.MessageFormat(
        "At {1,time,::jmm} on {1,date,::dMMMM}, "
        "there was {2} on planet {0,number}.",
        icu.Locale.get_us(),
    )
    tz = icu.TimeZone.get_gmt()
    subfmts = fmt.get_formats()
    subfmts[0].set_time_zone(tz)
    subfmts[1].set_time_zone(tz)
    date = 1637685775000.0  # 2021-11-23T16:42:55Z
    obj = icu.Formattable(
        [
            icu.Formattable(7),
            icu.Formattable(date, icu.Formattable.IS_DATE),
            icu.Formattable(icu.UnicodeString('a disturbance in the Force')),
        ]
    )
    dest = icu.UnicodeString()
    s = fmt.format(obj, dest)
    str(s)  # → 'At 4:42 PM on November 23, there was a disturbance in the Force on planet 7.'
  • icu::number::NumberFormatter

    from icupy import icu
    fmt = icu.number.NumberFormatter.with_().unit(icu.MeasureUnit.get_meter()).per_unit(icu.MeasureUnit.get_second())
    print(fmt.locale(icu.Locale.get_us()).format_double(3000).to_string())  # → '3,000 m/s'
    print(fmt.locale(icu.Locale.get_france()).format_double(3000).to_string())  # → '3 000 m/s'
    print(fmt.locale('ar').format_double(3000).to_string())  # → '٣٬٠٠٠ م/ث'
  • icu::BreakIterator

    from icupy import icu
    text = icu.UnicodeString('In the meantime Mr. Weston arrived with his small ship.')
    bi = icu.BreakIterator.create_sentence_instance(icu.Locale('en'))
    bi.set_text(text)
    list(bi)  # → [20, 55]
    # filter based on common English language abbreviations
    bi = icu.BreakIterator.create_sentence_instance(icu.Locale('en@ss=standard'))
    bi.set_text(text)
    list(bi)  # → [55]
  • icu::IDNA (UTS #46)

    from icupy import icu
    uts46 = icu.IDNA.create_uts46_instance(icu.UIDNA_NONTRANSITIONAL_TO_ASCII)
    dest = icu.UnicodeString()
    info = icu.IDNAInfo()
    uts46.name_to_ascii(icu.UnicodeString('faß.ExAmPlE'), dest, info)
    info.get_errors()  # → 0
    str(dest)  # → 'xn--fa-hia.example'
  • For more examples, see tests.

Installation

Prerequisites

Installing prerequisites

  • Windows

    Install the following dependencies.

  • Linux

    To install dependencies, run the following command:

    • Ubuntu/Debian:

      sudo apt install g++ cmake libicu-dev python3-dev python3-pip
    • Fedora:

      sudo dnf install gcc-c++ cmake icu libicu-devel python3-devel

    If your system's ICU is out of date, consider building ICU4C from source or installing pre-built ICU4C binary package.

Building icupy from source

  1. Configuring environment variables:

    • Windows:

      • Set the ICU_ROOT environment variable to the root of the ICU installation (default is C:\icu). For example, if the ICU is located in C:\icu4c:

        set ICU_ROOT=C:\icu4c

        or in PowerShell:

        $env:ICU_ROOT = "C:\icu4c"
      • To verify settings using icuinfo (64-bit):

        %ICU_ROOT%\bin64\icuinfo

        or in PowerShell:

        & $env:ICU_ROOT\bin64\icuinfo
    • Linux:

      • If the ICU is located in a non-regular place, set the PKG_CONFIG_PATH and LD_LIBRARY_PATH environment variables. For example, if the ICU is located in /usr/local:

        export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
        export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
      • To verify settings using pkg-config:

        $ pkg-config --cflags --libs icu-uc
        -I/usr/local/include -L/usr/local/lib -licuuc -licudata
  2. Installing from PyPI:

    pip install icupy

    Optionally, CMake environment variables are available. For example, using the Ninja build system and Clang:

    CMAKE_GENERATOR=Ninja CXX=clang++ pip install icupy

    Alternatively, installing development version from the git repository:

    pip install git+https://github.com/miute/icupy.git

Usage

  1. Configuring environment variables:

    • Windows:

      • Set the ICU_ROOT environment variable to the root of the ICU installation (default is C:\icu). For example, if the ICU is located in C:\icu4c:

        set ICU_ROOT=C:\icu4c

        or in PowerShell:

        $env:ICU_ROOT = "C:\icu4c"
    • Linux:

      • If the ICU is located in a non-regular place, set the LD_LIBRARY_PATH environment variables. For example, if the ICU is located in /usr/local:

        export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
  2. Using icupy:

    import icupy.icu as icu
    # or
    from icupy import icu

License

This project is licensed under the MIT License.