ovalhub/pyicu

Feature request: Edits

tavianator opened this issue · 13 comments

I'd like to use CaseMap and record the positions of the modifications.

Wow that was fast, thanks! Trying it now. I needed this patch to get it to build:

diff --git a/numberformat.cpp b/numberformat.cpp
index 8e9ee8e..f3ed0a5 100644
--- a/numberformat.cpp
+++ b/numberformat.cpp
@@ -788,7 +788,7 @@ static int t_decimalformatsymbols_init(t_decimalformatsymbols *self,
 {
     Locale *locale;
     DecimalFormatSymbols *dfs;
-#if U_ICU_VERSION_HEX >= VERSION_HEX(63, 0, 0)
+#if U_ICU_VERSION_HEX >= VERSION_HEX(60, 0, 0)
     NumberingSystem *system;
 #endif
 

The named properties don't seem to work, but the tuples do:

>>> import icu
>>> e = icu.Edits()
>>> icu.CaseMap.fold(0, "abcßDeF", e)
'abcssdef'
>>> i = e.getFineIterator()
>>> next(i)
(False, 3, 3, 0, 0, 0)
>>> i.hasChange
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'icu.EditsIterator' object has no attribute 'hasChange'

From looking here it seems to be (..., replacementIndex, destinationIndex), not the other way around.

One final note: it might be nice to support some overloads that take in UnicodeString as a destination, since the offsets in Edits will be UnicodeString offsets (i.e. UTF-16) not Python string offsets.

Seems to work but maybe you're thinking of another API ?

Ah, I was talking about things like this:

>>> string = UnicodeString()
>>> name = locale.getDisplayName(string)
>>> name
<UnicodeString: Portuguese (Brazil)>
>>> name is string
True

I didn't realise there's an actual ICU API that looks like that, I thought that was just a pyicu convention for getting the output as an icu.UnicodeString instead of str.

Anyway, everything works great, thanks!