ruby-i18n/ruby-cldr

Lateral inheritance fallback

Opened this issue · 4 comments

CLDR has a concept of "Lateral Inheritance" where a value will fallback to another value before falling back to ancestor locales.

Example

ia.units.unitLength.short.length-centimeter defines only a single other key, despite ia have a plural rule that requires a one value.

When resolving the value of ia.units.unitLength.short.length-centimeter.one, it should fall back to ia.units.unitLength.short.length-centimeter.other first.

This can get very complicated if there are multiple levels of lateral inheritance.

Potential solution?

Related to #67, ruby-cldr needs to decide how much of this to handle at the thor cldr:export layer vs. exposing to clients so they can make their own decisions.

Perhaps for now, ruby-cldr should resolve values for each of the required pluralization keys for a locale (e.g., copy other for the missing plural keys), while we wait to figure out what to do about the other dimensions (e.g., "gender", "case")

CLDR specifies a very specific algorithm for resolving inheritance and aliases, documented here. I don't think it makes sense to allow clients to make their own decisions, since the data set is explicitly structured to follow these rules. In every case, it should be possible to generate a full, expanded data set for every locale.

CLDR specifies a very specific algorithm for resolving inheritance and aliases, documented here.

@camertron For sure. The spec defines the various algorithms needed in order to resolve the correct values for any given path.

In every case, it should be possible to generate a full, expanded data set for every locale.

Yes. We can create fully "flattened"/resolved locale files by doing all the resolution as part of the export. While ruby-cldr doesn't do a good job at this today, it definitely should.

However, as a client you probably never want to work with fully flattened locale files, as they are much larger than they could be due to all the duplication between the files (loading them uses too much RAM). When you have a smarter client, you can avoid this duplication (at the expense of more lookups).

I don't think it makes sense to allow clients to make their own decisions

What I meant is that clients either need to be smart enough to know how to do the series of fallback lookups dictated by the CLDR resolution algorithms (i.e., which path to check next when a path isn't found), or else we need to flatten the data for them.

e.g., ruby-i18n/i18n's I18n::Backend::Fallbacks knows how to handle Locale Inheritance and Aliases, but not Lateral Inheritance. So either that client needs to be made smarter, or we need to flatten the Lateral Inheritance in a way that it can understand.


In this case, even before we can generate fully flattened files, or make a smarter client, there's a prerequisite that we define what the YAML serialization of this data should actually look like.

Example: How should we serialize this to YAML?:

<unit type="duration-day">
  <gender>masculine</gender>
  <displayName>Tage</displayName>
  <unitPattern count="one">{0} Tag</unitPattern>
  <unitPattern count="one" case="accusative">{0} Tag</unitPattern>
  <unitPattern count="one" case="dative">{0} Tag</unitPattern>
  <unitPattern count="one" case="genitive">{0} Tages</unitPattern>
  <unitPattern count="other">{0} Tage</unitPattern>
  <unitPattern count="other" case="accusative">{0} Tage</unitPattern>
  <unitPattern count="other" case="dative">{0} Tagen</unitPattern>
  <unitPattern count="other" case="genitive">{0} Tage</unitPattern>
  <perUnitPattern>{0} pro Tag</perUnitPattern>
</unit>

Perhaps something like this (some keys omitted for clarity):

duration-day:
  one:
    _: {0} Tag
    accusative: {0} Tag
    dative: {0} Tag
    genitive: {0} Tages
  other:
    _: {0} Tage
    accusative: {0} Tage
    dative: {0} Tagen
    genitive: {0} Tage

Obviously, this isn't backwards compatible with existing clients, since they would be looking for a single string value at one/other, not a YAML mapping (and it introduces the concept of a _ key).

I also have clients that assume that plural cases will always be leaf nodes, which would have to change their logic to understand this.

I'm just getting my head around these ideas.

FWIW, cldr-json exports the above example as:

"duration-day": {
  "gender": "masculine",
  "displayName": "Tage",
  "unitPattern-count-one": "{0} Tag",
  "accusative-count-one": "{0} Tag",
  "dative-count-one": "{0} Tag",
  "genitive-count-one": "{0} Tages",
  "unitPattern-count-other": "{0} Tage",
  "accusative-count-other": "{0} Tage",
  "dative-count-other": "{0} Tagen",
  "genitive-count-other": "{0} Tage",
  "perUnitPattern": "{0} pro Tag"
},

Using this code.

However, as a client you probably never want to work with fully flattened locale files, as they are much larger than they could be due to all the duplication between the files (loading them uses too much RAM).

I would want to see some benchmarks around this before agreeing. While fully flattened locale files will lead to higher memory consumption, my hunch is we're talking about a few megabytes. TwitterCLDR fully flattens locale data in almost all cases, and while the library doesn't support every language in CLDR or deal with all the data, it supports a bunch of ICU/CLDR features and only weighs in at ~20mb. I would think most clients would only load a percentage of that. Compared to the overhead of a web framework like Rails, that's not very significant.

What I meant is that clients either need to be smart enough to know how to do the series of fallback lookups dictated by the CLDR resolution algorithms (i.e., which path to check next when a path isn't found), or else we need to flatten the data for them.

Ok I see what you mean. Since ruby-cldr is a data generation gem, it makes sense to me to fully flatten. I suppose we could create a ruby-cldr-runtime gem or something that would also provide a data access layer over the top of the exported YAML files. I think ruby-cldr would also have to be modified to produce YAML files for each of the ancestor locales (in addition to the locales requested) so that lateral inheritance would be possible.