Python bindings for ICU4C using pybind11.
-
Naming Conventions
Renamed functions, methods, and C++ enumerators to conform to PEP 8.
- Function Names:
Use
lower_case_with_underscores
style. - Method Names:
Use
lower_case_with_underscores
style. Also, use one leading underscore only for protected methods. - C++ Enumerators: Use
UPPER_CASE_WITH_UNDERSCORES
style without a leading "k". (e.g.,kDateOffset
→DATE_OFFSET
) - APIs that match Python reserved words: e.g.,
with()
→with_()
- Function Names:
Use
-
Error Handling
-
Unlike the C/C++ APIs,
icupy
raises theicupy.icu.ICUError
exception if an error code indicates a failure instead of receiving an error codeUErrorCode
.You can access the icu::ErrorCode object from
ICUError.args[0]
. For example:from icupy import icu try: ... except icu.ICUError as e: print(e.args[0]) # → icupy.icu.ErrorCode print(e.args[0].get()) # → icupy.icu.UErrorCode
-
-
icu::UnicodeString with error callback
from icupy import icu cnv = icu.ucnv_open('utf-8') action = icu.UCNV_TO_U_CALLBACK_ESCAPE context = icu.ConstVoidPtr(icu.UCNV_ESCAPE_C) icu.ucnv_set_to_ucall_back(cnv, action, context) utf8 = b'\x61\xfe\x62' # Impossible bytes s = icu.UnicodeString(utf8, -1, cnv) str(s) # → 'a\\xFEb' action = icu.UCNV_TO_U_CALLBACK_ESCAPE context = icu.ConstVoidPtr(icu.UCNV_ESCAPE_XML_DEC) icu.ucnv_set_to_ucall_back(cnv, action, context) s = icu.UnicodeString(utf8, -1, cnv) str(s) # → 'aþb'
-
icu::UnicodeString with user callback
from icupy import icu def _to_callback( _context: object, _args: icu.UConverterToUnicodeArgs, _code_units: bytes, _length: int, _reason: icu.UConverterCallbackReason, _error_code: icu.UErrorCode, ) -> icu.UErrorCode: if _reason == icu.UCNV_ILLEGAL: _source = ''.join(['%{:02X}'.format(x) for x in _code_units]) icu.ucnv_cb_to_uwrite_uchars(_args, _source, len(_source), 0) _error_code = icu.U_ZERO_ERROR return _error_code cnv = icu.ucnv_open('utf-8') action = icu.UConverterToUCallbackPtr(_to_callback) context = icu.ConstVoidPtr(None) icu.ucnv_set_to_ucall_back(cnv, action, context) utf8 = b'\x61\xfe\x62' # Impossible bytes s = icu.UnicodeString(utf8, -1, cnv) str(s) # → 'a%FEb'
-
from icupy import icu tz = icu.TimeZone.create_time_zone('America/Los_Angeles') fmt = icu.DateFormat.create_instance_for_skeleton('yMMMMd', icu.Locale.get_english()) fmt.set_time_zone(tz) dest = icu.UnicodeString() s = fmt.format(0, dest) str(s) # → 'December 31, 1969'
-
from icupy import icu fmt = icu.MessageFormat( "At {1,time,::jmm} on {1,date,::dMMMM}, " "there was {2} on planet {0,number}.", icu.Locale.get_us(), ) tz = icu.TimeZone.get_gmt() subfmts = fmt.get_formats() subfmts[0].set_time_zone(tz) subfmts[1].set_time_zone(tz) date = 1637685775000.0 # 2021-11-23T16:42:55Z obj = icu.Formattable( [ icu.Formattable(7), icu.Formattable(date, icu.Formattable.IS_DATE), icu.Formattable(icu.UnicodeString('a disturbance in the Force')), ] ) dest = icu.UnicodeString() s = fmt.format(obj, dest) str(s) # → 'At 4:42 PM on November 23, there was a disturbance in the Force on planet 7.'
-
from icupy import icu fmt = icu.number.NumberFormatter.with_().unit(icu.MeasureUnit.get_meter()).per_unit(icu.MeasureUnit.get_second()) print(fmt.locale(icu.Locale.get_us()).format_double(3000).to_string()) # → '3,000 m/s' print(fmt.locale(icu.Locale.get_france()).format_double(3000).to_string()) # → '3 000 m/s' print(fmt.locale('ar').format_double(3000).to_string()) # → '٣٬٠٠٠ م/ث'
-
from icupy import icu text = icu.UnicodeString('In the meantime Mr. Weston arrived with his small ship.') bi = icu.BreakIterator.create_sentence_instance(icu.Locale('en')) bi.set_text(text) list(bi) # → [20, 55] # filter based on common English language abbreviations bi = icu.BreakIterator.create_sentence_instance(icu.Locale('en@ss=standard')) bi.set_text(text) list(bi) # → [55]
-
icu::IDNA (UTS #46)
from icupy import icu uts46 = icu.IDNA.create_uts46_instance(icu.UIDNA_NONTRANSITIONAL_TO_ASCII) dest = icu.UnicodeString() info = icu.IDNAInfo() uts46.name_to_ascii(icu.UnicodeString('faß.ExAmPlE'), dest, info) info.get_errors() # → 0 str(dest) # → 'xn--fa-hia.example'
-
For more examples, see tests.
- Python >=3.8
- ICU4C (ICU - The International Components for Unicode) (>=70 recommended)
- C++17 compatible compiler (see supported compilers)
- CMake >=3.7
-
Windows
Install the following dependencies.
- Python >=3.8
- Pre-built ICU4C binary package (>=70 recommended)
- Visual Studio 2015 Update 3 or newer. Visual Studio 2019 or newer recommended
- CMake >=3.7
- Note: Add CMake to the system PATH.
-
Linux
To install dependencies, run the following command:
-
Ubuntu/Debian:
sudo apt install g++ cmake libicu-dev python3-dev python3-pip
-
Fedora:
sudo dnf install gcc-c++ cmake icu libicu-devel python3-devel
If your system's ICU is out of date, consider building ICU4C from source or installing pre-built ICU4C binary package.
-
-
Configuring environment variables:
-
Windows:
-
Set the
ICU_ROOT
environment variable to the root of the ICU installation (default isC:\icu
). For example, if the ICU is located inC:\icu4c
:set ICU_ROOT=C:\icu4c
or in PowerShell:
$env:ICU_ROOT = "C:\icu4c"
-
To verify settings using
icuinfo
(64-bit):%ICU_ROOT%\bin64\icuinfo
or in PowerShell:
& $env:ICU_ROOT\bin64\icuinfo
-
-
Linux:
-
If the ICU is located in a non-regular place, set the
PKG_CONFIG_PATH
andLD_LIBRARY_PATH
environment variables. For example, if the ICU is located in/usr/local
:export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
-
To verify settings using
pkg-config
:$ pkg-config --cflags --libs icu-uc -I/usr/local/include -L/usr/local/lib -licuuc -licudata
-
-
-
Installing from PyPI:
pip install icupy
Optionally, CMake environment variables are available. For example, using the Ninja build system and Clang:
CMAKE_GENERATOR=Ninja CXX=clang++ pip install icupy
Alternatively, installing development version from the git repository:
pip install git+https://github.com/miute/icupy.git
-
Configuring environment variables:
-
Windows:
-
Set the
ICU_ROOT
environment variable to the root of the ICU installation (default isC:\icu
). For example, if the ICU is located inC:\icu4c
:set ICU_ROOT=C:\icu4c
or in PowerShell:
$env:ICU_ROOT = "C:\icu4c"
-
-
Linux:
-
If the ICU is located in a non-regular place, set the
LD_LIBRARY_PATH
environment variables. For example, if the ICU is located in/usr/local
:export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
-
-
-
Using
icupy
:import icupy.icu as icu # or from icupy import icu
This project is licensed under the MIT License.