Correction in Hindi Phrases
bazingarj opened this issue · 1 comments
मिठाई - mithai (coming up as mitha-i)
खुशबू - khushbu ( coming up as khasaba)
लेना - lena ( coming up as lana)
पैसे - paise (comping up as pasa)
अब - aba (must be ab)
The transliteration between scripts, like Devanagari to Latin in this case, is performed by the ICU library which uses the data of the Unicode CLDR.
The Devanagari-Latin transform internally transforms to InterIndic first and afterwards from InterIndic to Latin.
Taking “अब” for example, you can see that “अ” gets transformed to \uE005
in Devanagari-InterIndic.xml:20 and “ब” to \uE02C
in Devanagari-InterIndic.xml:59.
The Codepoints \uE005
and \uE02C
get assigned to $wa
in InterIndic-Latin.xml:21 and $ba
in InterIndic-Latin.xml:60.
And finally $wa
to “a” in InterIndic-Latin.xml:446 and $ba
to “ba” in InterIndic-Latin.xml:298.
In short:
अ -> \uE005 -> $wa -> a
ब -> \uE02C -> $ba -> ba
As I have no knowledge about Devanagari I can’t spot at which point the transformations are wrong.
It would be great, if you can file a ticket directly at the CLDR: http://cldr.unicode.org/index/bug-reports
You can reproduce the issue with a single line of PHP code:
echo \Transliterator::create('Deva-Latn')->transliterate('अब');