elixir-cldr/cldr_units

Localization fails when locale features patterns have no number substitution

Closed this issue · 23 comments

Hello @kipcole9 ,

Firstly, thanks for your work on this library, it's much appreciated.

I noticed that localizing an hour unit, or 2 hours, to either "he" or "ar" would result in an error. 3 hours will succeed.

iex(14)> {:ok, u} = Cldr.Unit.new(:hour, 1)
{:ok, #Cldr.Unit<:hour, 1>}
iex(15)> Cldr.Unit.to_string u, locale: "he"
** (FunctionClauseError) no function clause matching in Cldr.Substitution.substitute/2

    The following arguments were given to Cldr.Substitution.substitute/2:

        # 1
        "1"

        # 2
        ["שעה"]

    Attempted function clauses (showing 8 out of 8):

        def substitute([item], [0, string]) when is_binary(string)
        def substitute(item, [0, string]) when is_binary(string)
        def substitute([item], [string, 0]) when is_binary(string)
        def substitute(item, [string, 0]) when is_binary(string)
        def substitute(item, [string1, 0, string2]) when is_binary(string1) and is_binary(string2)
        def substitute([item_0, item_1], [0, string, 1]) when is_binary(string)
        def substitute([item_0, item_1], [1, string, 0]) when is_binary(string)
        def substitute([item_0, item_1, item_2], [0, string_1, 1, string_2, 2]) when is_binary(string_1) and is_binary(string_2)

    (ex_cldr 2.18.2) lib/cldr/substitution.ex:71: Cldr.Substitution.substitute/2
    (ex_cldr_units 3.3.1) Elixir.Cldr.Unit.erl:586: Cldr.Unit.to_iolist/3
    (ex_cldr_units 3.3.1) Elixir.Cldr.Unit.erl:482: Cldr.Unit.to_string/3
iex(15)> {:ok, u} = Cldr.Unit.new(:hour, 2)
{:ok, #Cldr.Unit<:hour, 2>}
iex(16)> Cldr.Unit.to_string u, locale: "he"
** (FunctionClauseError) no function clause matching in Cldr.Substitution.substitute/2

    The following arguments were given to Cldr.Substitution.substitute/2:

        # 1
        "2"

        # 2
        ["שעתיים"]

    Attempted function clauses (showing 8 out of 8):

        def substitute([item], [0, string]) when is_binary(string)
        def substitute(item, [0, string]) when is_binary(string)
        def substitute([item], [string, 0]) when is_binary(string)
        def substitute(item, [string, 0]) when is_binary(string)
        def substitute(item, [string1, 0, string2]) when is_binary(string1) and is_binary(string2)
        def substitute([item_0, item_1], [0, string, 1]) when is_binary(string)
        def substitute([item_0, item_1], [1, string, 0]) when is_binary(string)
        def substitute([item_0, item_1, item_2], [0, string_1, 1, string_2, 2]) when is_binary(string_1) and is_binary(string_2)

    (ex_cldr 2.18.2) lib/cldr/substitution.ex:71: Cldr.Substitution.substitute/2
    (ex_cldr_units 3.3.1) Elixir.Cldr.Unit.erl:586: Cldr.Unit.to_iolist/3
    (ex_cldr_units 3.3.1) Elixir.Cldr.Unit.erl:482: Cldr.Unit.to_string/3
iex(16)> {:ok, u} = Cldr.Unit.new(:hour, 3)
{:ok, #Cldr.Unit<:hour, 3>}
iex(17)> Cldr.Unit.to_string u, locale: "he"
{:ok, "3 שעות"}

Looking at the cldr data, there appears to be no substitution for the number:

          "duration-hour": {
            "displayName": "שעות",
            "unitPattern-count-one": "שעה",
            "unitPattern-count-two": "שעתיים",
            "unitPattern-count-many": "{0} שעות",
            "unitPattern-count-other": "{0} שעות",
            "perUnitPattern": "{0} לשעה"
          },

Yet I'm not seeing a function signature that would match such an input:

 # Takes care of a common case where there is one parameter
  def substitute([item], [0, string]) when is_binary(string) do
    [item, string]
  end

  def substitute(item, [0, string]) when is_binary(string) do
    [item, string]
  end

  def substitute([item], [string, 0]) when is_binary(string) do
    [string, item]
  end

  def substitute(item, [string, 0]) when is_binary(string) do
    [string, item]
  end

  def substitute(item, [string1, 0, string2]) when is_binary(string1) and is_binary(string2) do
    [string1, item, string2]
  end

  # Takes care of the common case where there are two parameters separated
  # by a string.
  def substitute([item_0, item_1], [0, string, 1]) when is_binary(string) do
    [item_0, string, item_1]
  end

  def substitute([item_0, item_1], [1, string, 0]) when is_binary(string) do
    [item_1, string, item_0]
  end

  # Takes care of the common case where there are three parameters separated
  # by strings.
  def substitute([item_0, item_1, item_2], [0, string_1, 1, string_2, 2])
      when is_binary(string_1) and is_binary(string_2) do
    [item_0, string_1, item_1, string_2, item_2]
  end

A naive solution would be to simply add the variants to handle a pattern with no substitution:

  # Takes care of the case where no parameters are used
  def substitute([item], [string]) when is_binary(string) do
    [string]
  end

  def substitute(item, [string]) when is_binary(string) do
    [string]
  end

... or even to percolate this detection up the call chain, such that substitute/* is never even called, but there is an inconsistency. I'm using the function with negative units, which is perfectly fine for most locales, but you'll notice with the example in Hebrew, the negative version is now the same as the positive:

iex(8)> {:ok, u} = Cldr.Unit.new(:hour, -1)
{:ok, #Cldr.Unit<:hour, -1>}
iex(9)> Cldr.Unit.to_string u, locale: "en"
{:ok, "-1 hour"}
iex(10)> Cldr.Unit.to_string u, locale: "he"
{:ok, "שעה"}
iex(11)> {:ok, u} = Cldr.Unit.new(:hour, 1)
{:ok, #Cldr.Unit<:hour, 1>}
iex(12)> Cldr.Unit.to_string u, locale: "he"
{:ok, "שעה"}

I'll probably work around this locally. If you're short on time, I could organise a PR for you, with a little guidance on how you want to deal with these problems.

Thanks for your time, and let me know.

Cheers,

Jarrod

@jarrodmoldrich, thanks for the detailed report and sorry for the inconvenience - thats a lot of code-diving you did. I suspect you are right and my very simple substitution code doesn't cater for circumstances where there is no substitution. If thats the case then its quite a simple update. I will work on this over the next 12 hours and revert - hopefully with a new release.

Awesome, thank you, that's a fast turn-around. Did you have any thoughts about handling negative units?

Here's a snippet from the Cldr documentation (emphasis added):

https://unicode-org.github.io/cldr/ldml/tr35-general.html#Unit_Elements

Note that for certain plural cases, the unit pattern may not provide for inclusion of a numeric value—that is, it may not include “{0}”. This is especially true for the explicit cases “0” and “1” (which may have patterns like “zero seconds”). In certain languages such as Arabic and Hebrew, this may also be true with certain units for the plural cases “zero”, “one”, or “two” (in these languages, such plural cases are only used for the corresponding exact numeric values, so there is no concern about loss of precision without the numeric value).

My best guess is that you use the other variant if the value doesn't match that of the explicit pattern, such that -1 doesn't match "one". Though we could look at the reference implementation (presuming there is one) to confirm that.

I have published ex_cldr version 2.19.1 and ex_cldr version 2.20.0-rc.3 that fixes the substitution bug you reported (thanks again for the report and the code for the fix which I used).

As the treatment of negative numbers in the case where the template does not have any substitutions. I believe there are two issues at play:

  1. CLDR number formatting, including unit formatting, is driven by CLDR's plural rules. These rules return the plural category for the absolute value of a number. Therefore the plural category for 1 and -1 will be the same in a given locale. You can see this for example:
ex> TestBackend.Cldr.Number.Cardinal.plural_rule -1, "he"
:one
iex> TestBackend.Cldr.Number.Cardinal.plural_rule 1, "he" 
:one

In a situation where a template has substitutions, then number formatting would insert the correct number (ie 1 or -1) but that can't happen in the particular case you are reporting.

  1. Second issue that I suspect ex_cldr_units might not be the right tool for the job if you are looking for -1 hours. I think the intent of CLDR's units, as it relates to time, is to primarily relate to "hours of the day". If you are looking for something more focused on dates, times and intervals then perhaps ex_cldr_dates_times might be more useful to you?

If you are able to share your use case then I'd be happy to suggest an appropriate localisation path.

My best guess is that you use the other variant if the value doesn't match that of the explicit pattern, such that -1 doesn't match "one". Though we could look at the reference implementation (presuming there is one) to confirm that.

Plural rules does, as you suggest, use :other if there is no matching plural rule.

Thanks for the excerpt from my old friend TR35. Let me think on how best to approach what that paragraph suggests. The reference implementation is either the ICU code; either the C or Java versions. I haven't dived into that code though.

Thanks again @kipcole9 . Really appreciate how quick you've moved on this.

As for my use case, I need to display the magnitude of a time change. Which could be forward or backwards an hour, 20 minutes, etc. The Cldr.DateTime.Relative formatting wasn't appropriate here, and also I'm unable to get Cldr.Calendar.Duration to work with negative values:

iex(8)> {_, d} = Cldr.Calendar.Duration.new ~T[00:01:00], ~T[00:00:00]
** (KeyError) key :year not found in: ~T[00:01:00]
    (ex_cldr_calendars 1.12.0) lib/cldr/calendar/duration.ex:532: Cldr.Calendar.Duration.back_one_day/3
    (ex_cldr_calendars 1.12.0) lib/cldr/calendar/duration.ex:330: Cldr.Calendar.Duration.apply_time_diff_to_duration/3

@jarrodmoldrich, thanks for sticking with me on this! I think my time is probably better invested in improving the duration code to cater for times and not just dates.

Would you mind letting me know where the formatting for Cldr.DateTime.Relative doesn't meet your requirements too?

I should be able to complete an update to duration calculations (and formatting) by the end of the weekend - is that good enough to meet your timeframe?

BTW, I don't know if the Hebrew calendar is a part of your requirements but I will eventually get to implementing it. First I need to finish up lunar calculations in my astro. And then the calendar code. Its a very complex calendar so Its not high on my priority list - unless there is unfulfilled demand and I can change priorities.

That time frame is perfectly fine for me, but don't feel like you need to rush. Bears repeating: I'm very appreciative of your time, and if you wish to delegate work to me, I'm happy to help also.

I'm currently only supporting Gregorian atm, so I won't compel you to implement the Hebrew calendar. Though when it is done, I'll probably use it.

It's possible there's some formatting options to Relative that would suit what I need. Essentially I need the equivalent of '1 hour' or '-1 hour'. So that would mean dropping the 'in' from 'in 1 hour', or the 'ago' from '1 hour ago'. I merely want to express the quantum of the time change. e.g., a day light savings change would be +/-1 hour in most territories.

Thanks @jarrodmoldrich, thats clear. Definitely Cldr.DateTime.Duration feels like the right place for that if there is no way to do it using the units rules (still need to review that again after your citation). I'll leave this issue open for now.

Since I don't speak Hebrew, may I ask you if any of the entries here would be grammatically correct for -1?

            "unitPattern-count-other": "{0} שעות",
            "perUnitPattern": "{0} לשעה"

Now I have some good news, and some less good news.

Good news

I fixed the bug that prevented creating Time-based intervals. A new version has been published to hex.pm. So now:

iex> Cldr.Calendar.Duration.new ~T[00:00:59], ~T[00:01:23]
{:ok,
 %Cldr.Calendar.Duration{
   day: 0,
   hour: 0,
   microsecond: 0,
   minute: 0,
   month: 0,
   second: 24,
   year: 0
 }}

The less good news

  1. I don't currently support "negative" durations. I've put it on the to-do list.
  2. In any case, formatting of durations delegates to ex_cldr_units so the road leads back to this library.

Continues as a work in progress.

Sorry for the late reply, but I was AFK. I know this isn't very scientific, but the result of google translate matches with "unitPattern-count-other". It's in a different order, but I assume unicode bidi characters would correct this.

image

I find it strange, but I'm already able to make time based intervals. But I'm not currently on the bleeding edge RC.

iex(8)> Cldr.Calendar.Duration.new ~T[00:00:59], ~T[00:01:23]
{:ok,
 %Cldr.Calendar.Duration{
   day: 0,
   hour: 0,
   microsecond: 0,
   minute: 0,
   month: 0,
   second: 24,
   year: 0
 }}

Thanks again for the updates.

Thanks much. I also fixed the Duration.new/2 bug and published release 1.12.1 on the current production branch so perhaps you have picked that up.

Working on a proper resolution now. Thanks for your patience.

Hi @kipcole9 , thanks for all those updates. I set all my deps to latest, but now I'm seeing this error for several/maybe all locales: Never mind, I force downloaded the locales and everything is sweet.

Thats very strange - new locale files should automatically download whenever a new release is configured. I've only seen this issue when locales are kept as part of some CI/CD caching. Is that perhaps the case here?

It is strange. This is my local build, and I haven't explicitly set up any caching.

Thanks for the feedback, I'll think on this some more. None of the code related to downloading was touched at all although the locale files have definitely changed and for this version they do indeed include a :locale_display_names key (currently unused, they will be used in a new library in the family soon).

I have pushed a commit to resolve this issue according to the except of TR35 you referenced. Would you consider testing your app with this version, directly from GitHub? In your mix.exs:

def deps() do
  {:ex_cldr_units, github: "elixir-cldr/cldr_units"},
  ...
end

If so, I will await your feedback before I publish a new version.

The draft changelog entry is:

Bug Fixes

  • Use :other plural category to format units which have a value of 0, 1 or 2 when the natural unit pattern has no substitutions. This corrects the situation in locales such as he and ar where the unit pattern for plural category :one has no substitutions. Previously this would means the formatted string for a unit with a vaue of 1 and -1 would both output the same string. Thanks to @jarrodmoldrich for the report and collaboration and to @voltone for his family's help with Hebrew grammar.

@kipcole9 That works great, thank you. It seems that you could have kept the explicit words for :zero, :one, and :two for positive integers, and just use :other for -1, -2, but I trust you've taken the better path here with your collaboration with @voltone and hopefully it generalizes to ar.

A couple of small feature request would be to allow out of order durations, and also to permit a duration from a single scalar duration, like a Time.from_seconds_after_midnight. For now I'm making a duration from the absolute value of time, and building the struct myself if I have a negative duration. It's not elegant but perfectly practical for now:

  defp duration(seconds, locale) do
    {:ok, duration} = Cldr.Calendar.Duration.new(~T[00:00:00], Time.from_seconds_after_midnight(abs(seconds)))
    duration = if seconds < 0 do
      # new for negative durations are not properly supported yet
      %Cldr.Calendar.Duration{
        day: 0, hour: -duration.hour, microsecond: 0, minute: 0, month: 0, second: -duration.second, year: 0
      }
    else
      duration
    end
    {:ok, duration} = Cldr.Calendar.Duration.to_string duration, style: :narrow, locale: locale
    duration
  end

Thanks again for turning this around so quick. Impressed and grateful.

@jarrodmoldrich, the code path still uses :zero, :one and :two plural categories but only if the number matches. My tests are passing but in this area are not exhaustive Are you seeing unexpected results? TR35 does seem to suggest should only use used for "languages like hebrew and arabic" but i'm still not certain of the boundary conditions. I will add some tests for ar to confirm.

Will take a look at what I can do to improve durations as you suggest. I'll move this part of the issue to ex_cldr_calendars

This is what I see with the current master and I think it matches the spec. Do let me know if you are seeing results which are not expected.

iex> Cldr.Unit.to_string Cldr.Unit.new!(1, :hour), locale: "he" 
{:ok, "שעה"}
iex> Cldr.Unit.to_string Cldr.Unit.new!(-1, :hour), locale: "he"
{:ok, "‎-1 שעות"}

iex> Cldr.Unit.to_string Cldr.Unit.new!(1, :hour), locale: "ar" 
{:ok, "ساعة"}
iex> Cldr.Unit.to_string Cldr.Unit.new!(-1, :hour), locale: "ar"
{:ok, "؜-١ ساعة"}

Ah yes, you have implemented it this way, I misunderstood what you said. I've verified this behaviour is reflecting on my build.

Ah, good to hear, thanks much. For any future curious people the relevant code is at https://github.com/elixir-cldr/cldr_units/blob/master/lib/cldr/unit/format.ex#L641-L676

I will add some more tests and publish this in the next 12 hours or so.