ausi/slug-generator

Correction in Hindi Phrases

bazingarj opened this issue · 1 comments

मिठाई - mithai (coming up as mitha-i)
खुशबू - khushbu ( coming up as khasaba)
लेना - lena ( coming up as lana)
पैसे - paise (comping up as pasa)
अब - aba (must be ab)

ausi commented

The transliteration between scripts, like Devanagari to Latin in this case, is performed by the ICU library which uses the data of the Unicode CLDR.

The Devanagari-Latin transform internally transforms to InterIndic first and afterwards from InterIndic to Latin.

Taking “अब” for example, you can see that “अ” gets transformed to \uE005 in Devanagari-InterIndic.xml:20 and “ब” to \uE02C in Devanagari-InterIndic.xml:59.
The Codepoints \uE005 and \uE02C get assigned to $wa in InterIndic-Latin.xml:21 and $ba in InterIndic-Latin.xml:60.
And finally $wa to “a” in InterIndic-Latin.xml:446 and $ba to “ba” in InterIndic-Latin.xml:298.

In short:

अ -> \uE005 -> $wa -> a
ब -> \uE02C -> $ba -> ba

As I have no knowledge about Devanagari I can’t spot at which point the transformations are wrong.
It would be great, if you can file a ticket directly at the CLDR: http://cldr.unicode.org/index/bug-reports

You can reproduce the issue with a single line of PHP code:

echo \Transliterator::create('Deva-Latn')->transliterate('अब');