[Feature Request] Add bidi algorithm
SafaAlfulaij opened this issue · 10 comments
SafaAlfulaij commented
https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html
It's massive, but maybe?
ovalhub commented
On Jul 22, 2020, at 22:24, Safa Alfulaij ***@***.***> wrote:
https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html
It's massive, but maybe?
Indeed, but it could be done. If you had python wrappers around this API, how would you use it ?
In particular, it would help me understand how this
API is used. Could you please propose some python pseudo-code using this API that I could then reuse for writing tests or an example ?
Thanks !
Andi..
… —
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
SafaAlfulaij commented
Sorry for the late reply.
I believe we can use something similar to what node-icu-bidi does.
layout = BidiLayout() # ubidi_open and init
layout.setPara("input text goes here") # ubidi_setPara, options at a later stage
level = layout.getParaLevel() # ubidi_getParaLevel
direction = layout.getDirection() # ubidi_getDirection
charLevels = []
for x in range(layout.getProcessedLength()): # ubidi_getProcessedLength
charLevels.append( layout.getLevelAt(x) ) # ubidi_getLevelAt
# Or charLevels = layout.getProcessedArray()
runs = []
for x in range(layout.countRuns()): # ubidi_countRuns
runs.append(layout.getVisualRun()) # ubidi_getVisualRun
This is the core functionality.
ovalhub commented
Great, thank you, this is useful !
… On Jul 25, 2020, at 22:11, Safa Alfulaij ***@***.***> wrote:
Sorry for the late reply.
I believe we can use something similar to what node-icu-bidi does.
layout = BidiLayout() # ubidi_open and init
layout.setPara("input text goes here") # ubidi_setPara, options at a later stage
level = layout.getParaLevel() # ubidi_getParaLevel
direction = layout.getDirection() # ubidi_getDirection
charLevels = []
for x in range(layout.getProcessedLength()): # ubidi_getProcessedLength
charLevels.append( layout.getLevelAt(x) ) # ubidi_getLevelAt
# Or charLevels = layout.getProcessedArray()
runs = []
for x in range(layout.countRuns()): # ubidi_countRuns
runs.append(layout.getVisualRun()) # ubidi_getVisualRun
This is the core functionality.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
ovalhub commented
I just checked in a first pass at this that covers the entire API except
for the subclassing support.
Here are the differences between what you proposed and what I ended up
implementing:
layout = BidiLayout() # ubidi_open and init
layout.setPara("input text goes here") # ubidi_setPara, options at a
later stage
layout = Bidi()
layout.setPara(UnicodeString("input text goes here"))
// I made using UnicodeString a requirement here because the Bidi object
// needs it for its entire lifespan and I didn't want to make repeated
// copies of the para text. The layout Bidi instance maintains a reference
// to it. You can access also it via layout.text.
// Note that line = layout.setLine(start, limit) also has a text property
// that shares the characters of layout's text. A line Bidi instance keeps
// track of its parent and maintains a reference to it. You can access it
// via line.parent.
level = layout.getParaLevel() # ubidi_getParaLevel
direction = layout.getDirection() # ubidi_getDirection
// Unchanged.
charLevels = []
for x in range(layout.getProcessedLength()): # ubidi_getProcessedLength
charLevels.append( layout.getLevelAt(x) ) # ubidi_getLevelAt
# Or charLevels = layout.getProcessedArray()
levels = layout.getLevels()
// levels is a tuple of 'processed length' level values.
// You can also run the for loop but it's going to be vastly slower.
runs = []
for x in range(layout.countRuns()): # ubidi_countRuns
runs.append(layout.getVisualRun()) # ubidi_getVisualRun
```
logicalStart, length, direction = layout.getVisualRun(x)
// runs are returns as triples
// You should be able to use layout.text[logicalStart:logicalStart+length]
// to get the implied text.
It would be nice if you could propose some unit tests for this new code and
the functionality covered by the API. There are a bunch of interesting
functions in there. Please, take a look at the new bidi.cpp source file for
all the API possibilities.
Thanks !
Andi..
SafaAlfulaij commented
You are awsome man! You are the C++-tar!
I will provide you with some unit tests once I take a good look on (and check) the API and the whole class.
Thanks a lot! You are truly awsome :)
ovalhub commented
On Tue, 28 Jul 2020, Safa Alfulaij wrote:
You are awsome man! You are the
[C++-tar](https://youtu.be/OCtmh4_Z0F8?t=179)!
I will provide you with some unit tests once I take a good look on (and
check) the API and the whole class.
Thanks a lot! You are truly awsome :)
You're welcome !
I'm hoping you can propose some tests that use some "real" text that has
several kinds of characters in them, not just plain placeholder ascii text.
Thanks !
Andi..
…
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#134 (comment)
SafaAlfulaij commented
I believe this is a good start (for now).
# ====================================================================
# Copyright (c) 2020 Open Source Applications Foundation.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
# ====================================================================
#
import sys, os
from unittest import TestCase, main
from icu import *
input_text = "\u0643\u0627\u0646 \u0057\u0069\u006e\u0064\u006f\u0077\u0073 \u0031\u0030 \u0623\u0648\u0644 \u0646\u0638\u0627\u0645 \u062a\u0634\u063a\u064a\u0644 \u0628\u062a\u0642\u0646\u064a\u0629 \u0034\u004b\u002e"
input_text_2 = "\u0643\u0627\u0646\u062a \u004d\u0069\u0063\u0072\u006f\u0073\u006f\u0066\u0074 \u0057\u0069\u006e\u0064\u006f\u0077\u0073 \u0627\u0644\u0645\u0627\u0644\u0643 \u0627\u0644\u0623\u0648\u062d\u062f \u0644\u0639\u0644\u0627\u0645\u0629 \u0058\u0050\u002e"
class TestBidi(TestCase):
def testDefault(self):
Locale.setDefault(Locale.getUS())
layout = Bidi()
layout.setPara(UnicodeString(input_text))
self.assertTrue( layout.getParaLevel() == 1 )
self.assertTrue( layout.getDirection() == UBiDiDirection.MIXED )
self.assertTrue( layout.getLevels() == (1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1) )
self.assertTrue( layout.getVisualRun(0) == (39, 1, 1) )
self.assertTrue( layout.getVisualRun(3) == (4, 10, 0) )
def testForcedDirection(self):
Locale.setDefault(Locale.getUS())
layout = Bidi()
layout.setPara(UnicodeString(input_text), 0)
self.assertTrue( layout.getParaLevel() == 0 )
self.assertTrue( layout.getDirection() == UBiDiDirection.MIXED )
self.assertTrue( layout.getLevels() == (1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 0) )
self.assertTrue( layout.getVisualRun(0) == (0, 3, 1) )
self.assertTrue( layout.getVisualRun(3) == (15, 22, 1) )
def testLines(self):
Locale.setDefault(Locale.getUS())
layout = Bidi()
layout.setPara(UnicodeString(input_text_2))
line_layout = layout.setLine(0, 15)
self.assertTrue( line_layout.getLevels() == (1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1) )
self.assertTrue( line_layout.getVisualRun(0) == (14, 1, 1) )
self.assertTrue( line_layout.getVisualRun(2) == (0, 5, 1) )
line_layout_2 = layout.setLine(15, 47)
self.assertTrue( line_layout_2.getLevels() == (2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1) )
self.assertTrue( line_layout_2.getVisualRun(0) == (31, 1, 1) )
self.assertTrue( line_layout_2.getVisualRun(2) == (7, 22, 1) )
if __name__ == "__main__":
main()
And I tried using writeReverse
but couldn't. Checking the code I need to pass a str argument that will be discarded?
I tried this after setPara
at testLines
:
print(layout.writeReverse('', Bidi.DO_MIRRORING))
with no result...
ovalhub commented
On Thu, 30 Jul 2020, Safa Alfulaij wrote:
I believe this is a good start (for now).
```python
... snip ...
```
Thank you, I integrated it into PyICU (not yet checked in).
And I tried using `writeReverse` but couldn't. Checking the code I need to pass a str argument that will be discarded?
writeReverse is a type method on the Bidi type, you can call it like that:
>> from icu import *
>> Bidi.writeReverse("this is text")
<UnicodeString: 'txet si siht'>
>>
As a class method it doesn't need an instance or a para.
It takes any kind of string, UnicodeString, python string, whatever that can
be converted into a UnicodeString (like most PyICU wrappers taking a
string)
I tried this after `setPara` at `testLines`:
```python
print(layout.writeReverse('', Bidi.DO_MIRRORING))
```
with no result...
The inverse of an empty string might be just that.
Andi..
…
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#134 (comment)
ovalhub commented
I now added and checked in your unit tests for the new Bidi wrappers.
Thank you for the contribution !
…On Fri, 31 Jul 2020, Safa Alfulaij wrote:
Closed #134.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#134 (comment)
ovalhub commented
On Fri, 31 Jul 2020, Safa Alfulaij wrote:
Closed #134.
I also now added wrappers for the UBiDiTransform API.
…
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#134 (comment)