ruby-i18n/i18n

[BUG] I18n.l returns incorrectly encoded strings

Opened this issue · 0 comments

What I tried to do

I18n.locale = 'nb-NO'

I18n.t('date.day_names')
#=> ["søndag", "mandag", "tirsdag", "onsdag", "torsdag", "fredag", "lørdag"]

I18n.t('date.day_names')[0].codepoints
#=> [115, 248, 110, 100, 97, 103] #=> 248 is the correct codepoint for "ø"

date_time = DateTime.parse('2024-06-15T15:00:00Z')
#=> Sat, 15 Jun 2024 15:00:00 +0000

time_with_zone = date_time.in_time_zone('Europe/Oslo')
#=> Sat, 15 Jun 2024 17:00:00.000000000 CEST +02:00

time_with_zone.class
#=> ActiveSupport::TimeWithZone

localized_with_date_time = I18n.l(date_time, format: :calendar_list_weekday)
#=> "lørdag"

localized_with_date_time.codepoints
#=> [108, 248, 114, 100, 97, 103] #=> correct

localized_with_time_with_zone = I18n.l(time_with_zone, format: :calendar_list_weekday)
#=> "lørdag"

localized_with_time_with_zone.codepoints
#=> [108, 195, 184, 114, 100, 97, 103] # incorrect; 195 and 184 are à and  ̧, respectively

localized_with_time_with_zone.encoding
#=> #<Encoding:UTF-8>

# workaround:
localized_with_time_with_zone.force_encoding('UTF-8').codepoints
#=> [108, 248, 114, 100, 97, 103]

What I expected to happen

I18n.l used with ActiveSupport::TimeWithZone returning the string with the codepoints exactly as in the locale file.

What actually happened

I18n.l returned the string where the character code is split in two codepoints, 0xC3 and 0xB8, respectively. This may lead to weird behavior when the resulting text is used elsewhere.

Versions of i18n, rails, and anything else you think is necessary

I18n::VERSION
#=> "1.14.5"