Thoughts on L10n Database Schema / Cross-Format Support
halostatue opened this issue · 0 comments
This is an extension of my comment in #142 and a few thoughts for how the schema might look for a 1.0 release as you indicated. I’m going to oversimplify the translation space a little, as there are many different solutions and each platform and framework seems to have its own. Of the “home-brewed” versions, I am most familiar with the Rails I18n solution which is almost as complete as the formats that I’m going to talk about. The less said about most other home-brew localization solutions, the better, as they are mostly created by Western European descendants who only deal with a couple of languages and have to adapt as they go along. I have written a home-brew localization system (but we only needed to support EFIGS at the time, so…).
There are basically three localization systems / formats worth talking about as models for localization:
Gettext is the oldest of these and the one that I am at this point most familiar with, but it is not necessarily the best. Project Fluent is the newest and is from Mozilla. ICU is interesting as it is now an official Unicode Consortium project, but it started out as a project from IBM (ICU4C and ICU4J libraries, but they didn’t have strong localization stories, just Unicode conversion stories at first).
I’m excluding XLIFF (both 1.2 and 2.0). Although an OASIS standard, it’s primarily for document translation and does not support pluralization.
Gettext
I described Gettext pretty thoroughly in my previous comment. There’s a lot of functionality and it’s not a bad basis for a translation model.
ICU
ICU translation using MessageFormat improves on Gettext by supporting gender in the way that Gettext supports pluralization. See https://github.com/elixir-cldr/cldr_messages and https://docs.google.com/presentation/d/1ZyN8-0VXmod5hbHveq-M1AeQ61Ga3BmVuahZjbmbBxo/pub?start=false&loop=false&delayms=3000&slide=id.g1bc43a82_2_14 for more information. It ties in with Unicode CLDR rules that already know the pluralization rules for 500 languages.
Project Fluent
Similar to ICU MessageFormat, this also supports gender and plural; implementations also appear to know pluralization via CLDR.