Feature request: Edits
tavianator opened this issue · 13 comments
I'd like to use CaseMap
and record the positions of the modifications.
Wow that was fast, thanks! Trying it now. I needed this patch to get it to build:
diff --git a/numberformat.cpp b/numberformat.cpp
index 8e9ee8e..f3ed0a5 100644
--- a/numberformat.cpp
+++ b/numberformat.cpp
@@ -788,7 +788,7 @@ static int t_decimalformatsymbols_init(t_decimalformatsymbols *self,
{
Locale *locale;
DecimalFormatSymbols *dfs;
-#if U_ICU_VERSION_HEX >= VERSION_HEX(63, 0, 0)
+#if U_ICU_VERSION_HEX >= VERSION_HEX(60, 0, 0)
NumberingSystem *system;
#endif
The named properties don't seem to work, but the tuples do:
>>> import icu
>>> e = icu.Edits()
>>> icu.CaseMap.fold(0, "abcßDeF", e)
'abcssdef'
>>> i = e.getFineIterator()
>>> next(i)
(False, 3, 3, 0, 0, 0)
>>> i.hasChange
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'icu.EditsIterator' object has no attribute 'hasChange'
From looking here it seems to be (..., replacementIndex, destinationIndex), not the other way around.
One final note: it might be nice to support some overloads that take in UnicodeString
as a destination, since the offsets in Edits
will be UnicodeString
offsets (i.e. UTF-16) not Python string offsets.
Seems to work but maybe you're thinking of another API ?
Ah, I was talking about things like this:
>>> string = UnicodeString()
>>> name = locale.getDisplayName(string)
>>> name
<UnicodeString: Portuguese (Brazil)>
>>> name is string
True
I didn't realise there's an actual ICU API that looks like that, I thought that was just a pyicu convention for getting the output as an icu.UnicodeString
instead of str
.
Anyway, everything works great, thanks!