ausi/slug-generator

Wrong transforms for Cyrillic letters

serggi opened this issue · 2 comments

Hi, thanks for a bundle.
Can you help me understand why Cyrillic (I've checked ukrainian ans russian) letters don't transform like here
https://github.com/unicode-org/cldr/blob/master/common/transforms/Ukrainian-Latin-BGN.xml
$slugGenerator->generate('щ', ['locale' => 'uk']) gives me s instead of shch

########################################################################
#
# BGN Page 94 Rule 3.6
#
# шч becomes sh·ch
#
########################################################################
#
ШЧ → SH·CH ; # CYRILLIC CAPITAL LETTER SHA
Шч → Sh·ch ; # CYRILLIC CAPITAL LETTER SHA
шч → sh·ch ; # CYRILLIC SMALL LETTER SHA
Ш} $lower → Sh ; # CYRILLIC CAPITAL LETTER SHA
Ш → SH ; # CYRILLIC CAPITAL LETTER SHA
ш → sh ; # CYRILLIC SMALL LETTER SHA
Щ} $lower → Shch ; # CYRILLIC CAPITAL LETTER SHCHA
Щ → SHCH ; # CYRILLIC CAPITAL LETTER SHCHA
щ → shch ; # CYRILLIC SMALL LETTER SHCHA
ausi commented

Hi @serggi, congrats to your first GitHub issue 🎉🚀

It looks like the SlugGenerator doesn’t find the correct transform for the uk locale. It uses the rule uk-Latn instead of uk-uk_Latn/BGN.

Probably we should add

$locale.'-'.$locale.'_'.$rule,
\Locale::getPrimaryLanguage($locale).'-'.\Locale::getPrimaryLanguage($locale).'_'.$rule,

to

$candidates,
$locale.'-'.$rule,
\Locale::getPrimaryLanguage($locale).'-'.$rule

Or it might be better to handle all script rules like Latn correctly by parsing the locale and setting the script via Intl’s Locale class.

I’m not sure if we can use the /BGN variant for all languages, we have to test the impact of adding it.

Until this issue is fixed by the library itself you can use the preTransform option in your project like so:

$slugGenerator->generate('щ', ['locale' => 'uk', 'preTransforms' => ['uk-uk_Latn/BGN']]);

Thanks, Martin for your help and your advice. It works well.