isUni and isZg string check feature

Question

isUni and isZg string check feature

Closed this issue 6 years ago · 11 comments

Should be add given data is unicode data or zawgyi data in rabbit converter.
Example

if ( Rabbit::isZawgyi(data)) {
    data = Rabbit::zg2uni(data);
}

Answer 1 · 2015-05-10T14:31:49.000Z

Argh, you mean font detector ?
On May 10, 2015 8:50 PM, "Nyan Lynn Htut" notifications@github.com wrote:

Should be add given data is unicode data or zawgyi data in rabbit
converter.
Example

if ( Rabbit::isZawgyi(data) {
data = Rabbit::zg2uni(data);
}

—
Reply to this email directly or view it on GitHub
#10.

Answer 2 · 2015-05-10T14:36:43.000Z

@yelinaung yes

Answer 3 · 2015-05-11T01:30:23.000Z

@nyanlynnhtut , this one cannot say 100% correct font detection because of some conflict code point.

Example : \u103A or \u103D

in zawgyi, it's ကျ but in unicode , it's က်

So, this one is zawgyi or unicode ?

In tagu , I use like

var regexUni = new RegExp("[ဃငဆဇဈဉညဋဌဍဎဏဒဓနဘရဝဟဠအ]်|ျ[က-အ]ါ|ျ[ါ-း]|\u103e|\u103f|\u1031[^\u1000-\u1021\u103b\u1040\u106a\u106b\u107e-\u1084\u108f\u1090]|\u1031$|\u1031[က-အ]\u1032|\u1025\u102f|\u103c\u103d[\u1000-\u1001]|ည်း|ျင်း|င်|န်း|ျာ|င့်");
var regexZG = new RegExp("\s\u1031| ေ[က-အ]်|[က-အ]း");

80% ok for long string. But problem in short string.

Answer 4 · 2015-05-11T03:03:18.000Z

Another concern and fear that I've always had is that people would actually take that chance to detect unicode and convert everything to Zawgyi. ( hmm the fear of @ravichhabra as well )
That would go totally wrong and nobody cannot say for sure that nobody is going to do that with "free" software available out there. Because nowadays, people don't really give a fuck about the open licensing. (That's the another story)

All the font detection rules you will ever find on the Internet are also based on Ko @ravichhabra font buster script.

In terms of technical, things will never be perfect but it will be "okay"-ish for most of the stuffs. We've been done that in close source apps like PyawKyi and it works well.

Answer 5 · 2015-05-11T03:12:51.000Z

@yelinaung , Font detection code that use in Tagu is base on Thant Thet MMFontTagger code and you can check from my blog , http://blog.saturngod.net/knowledgebase/tagu-firefox-addon

But I am not sure about this font detection license and need to confirm with Thant Thet or we need to write our own. If it's not WTFPL license , I don't want to use in Rabbit and prefer to write my own.

Answer 6 · 2015-05-11T03:15:21.000Z

It is indeed a complicated matter. I propose to postpone until some time. Some thoughts @thantthet @ravichhabra ?

Answer 7 · 2015-05-11T03:17:27.000Z

@yelinaung agree

Answer 8 · 2015-05-15T09:06:04.000Z

MyanmarFontTagger uses modified version of Ko @ravichhabra regex. So the best to ask him.

Answer 9 · 2015-05-15T20:10:05.000Z

I detect Zawgyi or Unicode by using some patterns such as ကွ , strings start with ေ and most of the zawgyi texts contain /u107E to /u1084. It's not reliable but useful 😜

Answer 10 · 2015-05-16T00:55:37.000Z

ေ is U1031 in both charset. I am curious how you detect it ?

Answer 11 · 2018-06-12T16:50:02.000Z

Please check https://github.com/googlei18n/myanmar-tools