sveltejs/kit

i18n brainstorming

Opened this issue · 234 comments

We've somewhat glossed over the problem of internationalisation up till now. Frankly this is something SvelteKit isn't currently very good at. I'm starting to think about how to internationalise/localise https://svelte.dev, to see which parts can be solved in userland and which can't.

(For anyone unfamiliar: 'Internationalisation' or i18n refers to the process of making an app language agnostic; 'localisation' or l10n refers to the process of creating individual translations.)

This isn't an area I have a lot of experience in, so if anyone wants to chime in — particularly non-native English speakers and people who have dealt with these problems! — please do.

Where we're currently at: the best we can really do is put everything inside src/routes/[lang] and use the lang param in preload to load localisations (an exercise left to the reader, albeit a fairly straightforward one). This works, but leaves a few problems unsolved.

I think we can do a lot better. I'm prepared to suggest that SvelteKit should be a little opinionated here rather than abdicating responsibility to things like i18next, since we can make guarantees that a general-purpose framework can't, and can potentially do interesting compile-time things that are out of reach for other projects. But I'm under no illusions about how complex i18n can be (I recently discovered that a file modified two days ago will be labeled 'avant-hier' on MacOS if your language is set to French; most languages don't even have a comparable phrase. How on earth do you do that sort of thing programmatically?!) which is why I'm anxious for community input.


Language detection/URL structure

Some websites make the current language explicit in the pathname, e.g. https://example.com/es/foo or https://example.com/zh/foo. Sometimes the default is explicit (https://example.com/en/foo), sometimes it's implicit (https://example.com/foo). Others (e.g. Wikipedia) use a subdomain, like https://cy.example.com. Still others (Amazon) don't make the language visible, but store it in a cookie.

Having the language expressed in the URL seems like the best way to make the user's preference unambiguous. I prefer /en/foo to /foo since it's explicit, easier to implement, and doesn't make other languages second-class citizens. If you're using subdomains then you're probably running separate instances of an app, which means it's not SvelteKit's problem.

There still needs to be a way to detect language if someone lands on /. I believe the most reliable way to detect a user's language preference on the server is the Accept-Language header (please correct me if nec). Maybe this could automatically redirect to a supported localisation (see next section).

Supported localisations

It's useful for SvelteKit to know at build time which localisations are supported. This could perhaps be achieved by having a locales folder (configurable, obviously) in the project root:

locales
|- de.json
|- en.json
|- fr.json
|- ru.json
src
|- routes
|- ...

Single-language apps could simply omit this folder, and behave as they currently do.

lang attribute

The <html> element should ideally have a lang attribute. If SvelteKit has i18n built in, we could achieve this the same way we inject other variables into src/template.html:

<html lang="%svelte.lang%">

Localised URLs

If we have localisations available at build time, we can localise URLs themselves. For example, you could have /en/meet-the-team and /de/triff-das-team without having to use a [parameter] in the route filename. One way we could do this is by encasing localisation keys in curlies:

src
|- routes
   |- index.svelte
   |- {meet_the_team}.svelte

In theory, we could generate a different route manifest for each supported language, so that English-speaking users would get a manifest with this...

{
  // index.svelte
  pattern: /^\/en\/?$/,
  parts: [...]
},

{
  // {meet_the_team}.svelte
  pattern: /^\/en/meet-the-team\/?$/,
  parts: [...]
}

...while German-speaking users download this instead:

{
  // index.svelte
  pattern: /^\/de\/?$/,
  parts: [...]
},

{
  // {meet_the_team}.svelte
  pattern: /^\/de/triff-das-team\/?$/,
  parts: [...]
}

Localisation in components

I think the best way to make the translations themselves available inside components is to use a store:

<script>
  import { t } from '$app/stores';
</script>

<h1>{$t.hello_world}</h1>

Then, if you've got files like these...

// locales/en.json
{ "hello_world": "Hello world" }
// locales/fr.json
{ "hello_world": "Bonjour le monde" }

...SvelteKit can load them as necessary and coordinate everything. There's probably a commonly-used format for things like this as well — something like "Willkommen zurück, $1":

<p>{$t.welcome_back(name)}</p>

(In development, we could potentially do all sorts of fun stuff like making $t be a proxy that warns us if a particular translation is missing, or tracks which translations are unused.)

Route-scoped localisations

We probably wouldn't want to put all the localisations in locales/xx.json — just the stuff that's needed globally. Perhaps we could have something like this:

locales
|- de.json
|- en.json
|- fr.json
|- ru.json
src
|- routes
   |- settings
      |- _locales
         |- de.json
         |- en.json
         |- fr.json
         |- ru.json
      |- index.svelte

Again, we're in the fortunate position that SvelteKit can easily coordinate all the loading for us, including any necessary build-time preparation. Here, any keys in src/routes/settings/_locales/en.json would take precedence over the global keys in locales/en.json.

Translating content

It's probably best if SvelteKit doesn't have too many opinions about how content (like blog posts) should be translated, since this is an area where you're far more likely to need to e.g. talk to a database, or otherwise do something that doesn't fit neatly into the structure we've outlined. Here again, there's an advantage to having the current language preference expressed in the URL, since userland middleware can easily extract that from req.path and use that to fetch appropriate content. (I guess we could also set a req.lang property or something if we wanted?)

Base URLs

Sapper (ab)used the <base> element to make it easy to mount apps on a path other than /. <base> could also include the language prefix so that we don't need to worry about it when creating links:

<!-- with <base href="de">, this would link to `/de/triff-das-team` -->
<a href={$t.meet_the_team}>{$t.text.meet_the_team}</a>

Base URLs haven't been entirely pain-free though, so this might warrant further thought.


Having gone through this thought process I'm more convinced than ever that SvelteKit should have i18n built in. We can make it so much easier to do i18n than is currently possible with libraries, with zero boilerplate. But this could just be arrogance and naivety from someone who hasn't really done this stuff before, so please do help fill in the missing pieces.

I've some thoughts about it...

Plurals

Should be way to use any plural forms of the phrase based on number value.

Something like...
html <p>{$t.you_have} {number} {$t.message(number)}</p>

// locales/en.json
{
    you_have: "You have",
    message:["message","messages"]
}

// locales/ru.json
{
     you_have: "У вас",
    message: ["сообщение","сообщения","сообщений"]
}

And we should know plural rules for all languages of the world =)

Formatting

For example, americans write date this way — 02/23, russians — 23.02. There are a lot of things like currency and so on that may be formatted.

Language detection/URL structure

I'd prefer to use third-level domain like ru.svelte.technology but with single app with i18l support. I understand that it require server configuration, that common Sapper user may not to know how to do. Maybe we could have config option to choose URL path based language detection or URL domain based language detection.

Localised URLs

Don't do it. I never seen i18n system with supporting it. There are additional problems like checking URLs for unsupported symbols, complexity of the routes and code, broken links and so on. It is my personal opinion.

Localisation in components

Maybe we can find better way and replace this...
html <p>{$t.welcome_back(name)}</p>
by
html <p>{$t(`Welcome back ${name}`)}</p>

Developer should be able to use default language visually. It will make content creation simpler for the author.
But I have no idea about perfect implementation...

Alternative

My position that Sapper doesn't need built in i18n system at all. Only a small fraction of all websites needs to be multilanguage. We can make separate Sapper plugin for i18n providing, or we can use one of many existing libs.

PS: Sorry for my poor english...

Some thoughts:

  1. t should not be a store (though it’s reasonable for the current locale to be a store): switching languages is an exceedingly rare operation (in almost all apps I’d expect it to be used in well under one in ten thousand sessions—the default is probably right, and very few ever change locale, and those that do so probably only need to once, because you then store their preferred language, right? right?), and it being a store adds a fair bit of bloat. Rather, switching locales should just throw away everything (sorry about your transient state, but switching languages is effectively moving to a different page) and redraw the root component.

  2. You need the concept of “such-and-such a URL, but in a different locale”, for things like language switchers and the equivalent meta tags.

  3. Localised slugs are conceptually and generally aesthetically nice, but risky. If the locale is in the URL, you would still ideally handle localised forms of slugs, for naive users and possibly tools that try just swapping the locale in the URL (viz. /en/triff-das-team redirects to /en/meet-the-team). You will observe that points two and five also contain serious risks on localised slugs, where they mean that you must know the localised form of the slugs in all locales, regardless of what locale you’re in at present.

  4. Locale-in-URL is not the only thing you may want: it’s common to encode the country in there too (or instead), and have that matter for more than just language strings. For example, /en-au/ might switch to show Australian products and prices, as well as hopefully talking about what colour the widgets you can buy are, but let’s be honest, they probably didn’t actually make an en-au locale, so it’ll probably just be using the en locale which will be inexplicably American-colored despite not being called en-us. But I digress.

  5. What you’ve described sounds to be designed for web sites rather than web apps.

    • Sites: including the locale in the URL is generally the right thing to do, although it can be a pain when people share links and they’re in the wrong language. If the user changes language, you probably still want to save that fact in a cookie or user database, so that the site root chooses the right language. (Otherwise users of non-default languages will be perpetually having to switch when they enter via search engines.) You should ideally still support locale-independent URLs, so that if I drop the en/ component of the URL it’ll pick the appropriate locale for the user. I believe setting that as the canonical URL for all the locale variants will help search engines too, and thus users so that they’re not constantly taken back to the English form if they wanted Norwegian, but I have no practical experience with the same (my actual i18n experience is only in apps).

    • Apps: including the locale in the URL is generally the wrong thing to do; you will instead want to keep track of each user’s language per account; localised slugs are right out for apps in general, too.

Route-scoped localisations

Maybe better way is component-scoped localizations? Like in Vue-i18n plugin.
But any scoped localizations make it difficult to support by translators.

Thanks for this input, it's very valuable.

Just to clarify my thinking a bit: if we have locales in .json files, Sapper can turn those at build time into JavaScript modules. This is what I mean when I say that we're potentially able to do things in Sapper — 'precompiled' translations with treeshaken i18n helpers — that might be a little difficult with more general-purpose tools. So:

Plurals

This locale file...

// locales/ru.json
{
  you_have: "У вас",
  message: ["сообщение","сообщения","сообщений"]
}

could be compiled into something like this:

import { create_plural } from '@sveltejs/i18n/ru';

export default {
  you_have: 'У вас',
  message: create_plural(['сообщение', 'сообщения', 'сообщений'])
};

create_plural encodes all the (somewhat complex, I just learned! 😆 ) pluralisation rules for the Russian language.

(Having said that, the examples on https://www.i18njs.com point towards using more complete sentences as keys, i.e. "You have %n messages" rather than "you_have" and "message".)

In an ideal world, someone would have already encoded all the different pluralisation rules already in a way that we can just reuse. I don't know if that's the case.

Formatting

I wonder if that can be done with symbols like %c and %n and %d?

// locales/fr.json
{
  "Your payment of %c for %n widgets is due on %d": 
    "Votre paiement de %c pour %n widgets est dû le %d."
}
import { format_currency, format_date } from '@sveltejs/i18n/fr';

export default {
  'Your payment of %c for %n widgets is due on %d': (c, n, d) =>
    `Votre paiement de ${format_currency(c)} pour ${n} widgets est dû le ${format_date(d)}.`
};

(Am glossing over the pluralisation of 'widgets' and the nuances of date formatting — '1 April' vs '1 April 2019' vs 'tomorrow' or 'is overdue' — but you get the general thrust.)

Localised URLs

Don't do it. I never seen i18n system with supporting it.

I had a conversation on Twitter recently providing one data point to the contrary. I think you're right that it's very rare, though I have to wonder if that's because of the sheer difficulty of it with existing tools (i.e. the same reason it can't be done in userland in Sapper). Django supports it — see the example of /en/news/category/recent/ vs /nl/nieuws/categorie/recent/.

At the very least, if we did it it would be opt-in — you just need to choose between meet-the-team.svelte or {meet_the_team.svelte}.

Localisation in components

Developer should be able to use default language visually.

Yeah, I think this is probably true, though I wonder if it makes it harder to keep localisations current. Anyway, I did have one idea about usage — maybe there's a clever way to use tagged template literals:

<p>{$t`Welcome back ${name}`}</p>

Maybe better way is component-scoped localizations?

We shouldn't rule it out. Separate .json files would definitely make the implementation easier though...

PS: Sorry for my poor english...

Ваш английский лучше моего русского 😀

@chris-morgan

it being a store adds a fair bit of bloat

Can you expand on that? Sapper is already using stores so there's no additional weight there — do you mean the component subscriptions? I guess it could be t instead of $t, it'd mean we'd need a way to force reload on language change rather than using normal client-side navigation.

You need the concept of “such-and-such a URL, but in a different locale”, for things like language switchers and the equivalent meta tags.

Interesting... can you elaborate? I take it a <link rel="canonical"> is insufficient? Definitely preferable to avoid a situation where every locale needs to know every slug for every other locale.

For example, /en-au/ might switch to show Australian products and prices

I guess this is where precompilation could come in handy — we could generate en-us, en-gb and en-au from en.json by just swapping out the currency formatters or whatever (though you'd need a way to say 'Sapper, please support these countries'). Maybe the existence of an en-au.json locale file would be enough for that; any missing keys would be provided by en.json:

// locales/en.json
{
  "Hello": "Hello",
  "Welcome back, %s": "Welcome back, %s"
}
// locales/en-au.json — the existence of this file causes a module to be created
// that uses Australian currency formatter etc
{
  "Hello": "G'day mate"
  // other keys fall back to en.json
}

What you’ve described sounds to be designed for web sites rather than web apps.

Yeah, I can see that. I guess it could be as simple as an option — sapper build --i18n-prefix for /xx/foo, or sapper build --no-i18n-prefix for /foo, or something.


A few things that didn't occur to me last night:

Constant URLs

Not every URL should be localised — static assets, for example, but also probably some pages and server routes. Maybe need a way to distinguish between them.

RTL languages

No idea what's involved here.

SEO

Some good information here.


Looking at all this, I can certainly see why someone would say 'this is too much complexity, it should be handled by third party tools'. On the contrary I think that's probably why it should be handled by Sapper — it's a hard problem that is very difficult to deal with entirely in userland, and which is prone to bloaty solutions.

rtorr commented

One thing that has been helpful in some of my implementations is partial localization. If a key does not exist in one language, it will fallback to (in my case) English.

In an ideal world, someone would have already encoded all the different pluralisation rules already in a way that we can just reuse. I don't know if that's the case.

Rules are described by Mozilla — https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_and_Plurals . But I don't think they are cover all possible languages. So ability to make custom plural function will be very useful for natives from lost pacific islands.

Reading whole Mozilla's Localization section may give some other thoughts about i18n,l10n and even l12y.

maybe there's a clever way to use tagged template literals

wow, it's look amazing!

I wonder if it makes it harder to keep localisations current

Maybe we can store strings as JSON property names? And autogenerate default language json file when compiling. Then translators can look for differs in other json files.

// locales/en.json
{
	"Military-grade progressive web apps, powered by Svelte": "Military-grade progressive web apps, powered by Svelte",
	"You are in the army now, ${name}": "You are in the army now, ${name}"
}

// locales/ru.json
{
	"Military-grade progressive web apps, powered by Svelte": "Прогрессивные веб-приложения военного качества на платформе Svelte",
	"You are in the army now, ${name}": "Теперь ты в армии, ${name}"
}
trbrc commented

(For anyone unfamiliar: 'Internationalisation' or i18n refers to the process of making an app language agnostic; 'localisation' or l10n refers to the process of creating individual translations.)

This might come across as a little pedantic, but localisation and translation aren't quite the same thing, even though the terms are often used interchangeably. Localisation is about adapting to a specific region (locale), while translation is about adapting to a specific language. One locale might need multiple languages, and one language can exist in multiple locales.

But I don't think it's correct to have a folder locales and with files named after language codes. A better folder name would be languages or translations.

In practice, localisation might involve any kind of change to a website, or no change at all. So I think it has to be handled with feature flags or other configuration.

thgh commented

Thoughts:

A fallback language stack. If a key is missing, it could look in the next language file and finally fallback to the language key itself. For example: es || fr || en || key. Another advantage is that developers don't have to come up with custom translation keys, and can instead write {$t('Meet the team')}. From my experience, developers are terrible in choosing translation keys, like button_1, button_2, button_3, ...

How about a directive? {#t 'Welcome back' name}

It would be pretty awesome if there was a hook that would allow to send new strings to an automatic translation service. If that's too much, perhaps a list of translation keys in the manifest?

By the way, I'm using this translation store in a project:

import { derive, readable } from 'svelte/store'

export const lang = readable(set => set(process.browser && localStorage.lang || 'nl'))

export const t = derive(lang, lang => translations[lang])

export const translations = {
  nl: {
    meet: 'ons team'
  },
  en: {
    meet: 'meet the team'
  }
}

Definitely not ideal. I would prefer if there were a $session or $cookie store that I could derive from.

Having worked extensively on projects that involve multiple translations (I was involved for many years on building a platform that ingested language files from institutions like Oxford Publishing with internationalized projects and so forth), I can say with certainty this is a massive rabbit hole.

That said, I applaud the effort and 100% support it.

Re: Localized URLs, I'm firmly on the side of the fence that it should be opt-in. I can see both scenarios wanting to use it and not. A good chunk of the time I won't want URLs saying anything about language, but something where language is explicit to the business or organizational (or individual) aims of a website, sometimes it will be wanted in the URL. #depends

I really like this direction, whatever form it ends up taking, and as long as it's opt in (it appears that it would be):

<p>{$t`Welcome back ${name}`}</p>

I think i18n should be designed in sveltejs/sapper as a common ground between translation features and developer features and, yes, all of them are opinionated!

From the translation oriented features, I think it's worth to take look at https://projectfluent.org/ as well as ICU message format ( http://userguide.icu-project.org/formatparse/messages ). They have put a lot of effort in designing a dsl language to keep translation logic out of application logic. Maybe just swapping json keys in translation files is too simplistic for the inherent complexity of language.

# Translation file en/messages.ftl
unread-emails =
    You have { $emails_count ->
        [0] no unread emails
        [one] one unread email
       *[other] { $emails_count } unread emails
    }.

<p>{ $t('unread-emails', { email_count: userUnreadEmails } ) }</p>
<script>
  import { t } from 'sapper/svelte';
  export let userUnreadEmails;
</script>

It would be great to see sveltejs/sapper support out of the box one of these formats/libraries, maybe opting-in.

I have no clue how to codesplit translations, but it would be great to come up with a convention to lazy load translations as components are loaded into the application.

I've received lots of really helpful pointers both here and on Twitter — thanks everyone. Having digested as much as I can, I'm starting to form some opinions... dangerous I know. Here's my current thinking:

The best candidate for a translation file format that I've seen yet is the 'banana' format used by MediaWiki (it also forms the basis of jquery.i18n). It seems to be able to handle most of the hellacious edge cases people have written about, while avoiding a) boilerplate, b) wheel reinvention and c) being prohibitively difficult for non-technical humans to read. (Oh, and it's JSON.) The fact that it's used by something as large as MediaWiki gives me hope. If anyone has experience with it and is alarmed by this suggestion, please say so! (One thing — I haven't quite figured out where number/date/currency formatting fit in.)

No idea why it's called 'banana'. Here's an example:

// en.json — input
{
  "@metadata": {
    "authors": []
  },
  "hello": "Hello!",
  "lucille": "It's {{PLURAL:$1|one banana|$1 bananas|12=a dozen bananas}} $2. How much could it cost, $3?"
}

I think it'd be possible to compile that to plain JavaScript, so that you could output a module you could consume like so:

import t from './translations/en.js';

console.log(t.hello); // Hello!
console.log(t.lucille(1, 'Michael', '$10')); // It's one banana Michael. How much could it cost, $10?

For code-splitting, this is what I'm thinking: Suppose you have global translations in locales/en.json (or e.g. languages/en.json, per @trbrc's comment) and some route-specific translations in src/routes/settings/languages/en.json. Sapper might generate two separate modules:

  • src/node_modules/@sapper/internal/i18n/en/0.js // global translations
  • src/node_modules/@sapper/internal/i18n/en/1.js // route-specific

The second of these files might look like this:

import translations from './0.js';

export default Object.assign({}, translations, {
  avatar: 'Avatar',
  notifications: 'Notifications',
  password: 'Password'
});

The route manifest for /settings could look like this:

{
  // settings/index.svelte
  pattern: /^\/en/settings\/?$/,
  parts: [...],
  i18n: () => import('./i18n/1.js')
}

So when you first load the page, 0.js gets loaded, but when you navigate to /settings, the browser only needs to fetch 1.js (and it can get preloaded, just like the component itself and any associated data and CSS). This would all happen automatically, with no boilerplate necessary. And because it's just JSON it would be easy to build tooling that ensured translations weren't missing for certain keys for certain languages.

The banana format does steer us away from this...

<p>{t`Hello ${name}!`}</p>

...and towards this:

<p>{t.hello(name)}</p>

I'm not convinced that's such a bad thing — it's certainly less 'noisy', and forces you to keep your default language .json file up to date (which, combined with the tooling suggested above, is probably the best way to keep translations up to date as well).

@thgh yep, should definitely have some sort of fallback. Not sure what this would look like — the simplest would obviously be to just have a single default. I'm not so sure about a directive, since it would entail changes to Svelte, and would make it harder to use translations in element attributes (or outside the template, in the <script>).

@Rich-Harris I just drop it in here: https://github.com/lingui/js-lingui

What I like with this library is the tooling they have:
https://lingui.js.org/tutorials/cli.html#add-a-new-locale

Where you can:

  • add a new locale easily
  • extract translations automatically from you from components
  • cleaning up obsolete messages
  • pseudo localization
  • message as id's

The last part is particularly interesting, rather than you create a bunch of ids for translation, it uses the actual content in translating. That way it make's it easy for any translator to edit it, heck anyone with a text editor can add it and knows what to do.

eg. extracting translations from component might look like this (taken from js lingui wiki)

{
  "Message Inbox": "",
  "See all <0>unread messages</0> or <1>mark them</1> as read.": "",
  "{messagesCount, plural, one {There's {messagesCount} message in your inbox.} other {There're {messagesCount} messages in your inbox.}}": "",
  "Last login on {lastLogin,date}.": "",
}

And a translated version would look like this:

{
  "Message Inbox": "Přijaté zprávy",
  "See all <0>unread messages</0> or <1>mark them</1> as read.": "Zobrazit všechny <0>nepřečtené zprávy</0> nebo je <1>označit</1> jako přečtené.",
  "{messagesCount, plural, one {There's {messagesCount} message in your inbox.} other {There're {messagesCount} messages in your inbox.}}": "{messagesCount, plural, one {V příchozí poště je {messagesCount} zpráva.} few {V příchozí poště jsou {messagesCount} zprávy. } other {V příchozí poště je {messagesCount} zpráv.}}",
  "Last login on {lastLogin,date}.": "Poslední přihlášení {lastLogin,date}",
}

It also introduces slots, which is to be honest a big deal in i18n. With translations, you probably want to style a word inside a translation. Old solution would add a new message id for that particular item, even though the whole translation supposed to be treated as one unit. The problem taking out those text inside the translation message is that it looses context. If a translator just see a word without a context, then he/she could probably give a different translation not intended for the actual message.

I think it's the cleanest solution I have seen among any library. Shout out to @tricoder42 for creating such an awesome library.

Hey everyone, thanks @thisguychris for mention.

I read the thread briefly and I have few suggestions if you don't mind:

(Having said that, the examples on https://www.i18njs.com point towards using more complete sentences as keys, i.e. "You have %n messages" rather than "you_have" and "message".)

I would really recommend this approach for two reasons:

  1. Context is very important for translators. Translating You have %n messages as a sentence will give more accurate translation than translating You have and message.
  2. The order of words in a sentence isn't the same for different languages. Order of words/chunks hardcoded in source code might break in the future for some language.

In an ideal world, someone would have already encoded all the different pluralisation rules already in a way that we can just reuse. I don't know if that's the case.

There's actually: Plural rules for very large number of languages are defined in CLDR. There're lot of packages on NPM which parse CLDR data, like make-plural. Few languages are missing though (e.g. Haitian and Maori).

I wonder if that can be done with symbols like %c and %n and %d?

// locales/fr.json
{
  "Your payment of %c for %n widgets is due on %d": 
    "Votre paiement de %c pour %n widgets est dû le %d."
}
import { format_currency, format_date } from '@sveltejs/i18n/fr';

export default {
  'Your payment of %c for %n widgets is due on %d': (c, n, d) =>
    `Votre paiement de ${format_currency(c)} pour ${n} widgets est dû le ${format_date(d)}.`
};

ICU MessageFormat uses argument formatters:

Hello {name}, today is {now, date}

Formatting of date arguments depends on implementation, so it could be done using Intl.DateTimeFormat, date-fns, moment.js, whatever.

I've been thinking a lot about this approach and it's useful when you want to change date format in different locales:

Hello {name}, today is {now, date, MMM d}

but you could achieve the same in the code as well:

i18n._("Hello {name}, today is {now}", { name, now: i18n.format.date(now) })

where i18n.format.date is something like this:

// pseudocode using `format` from date-fns
function date(value) {
  const formatStr = this.formatStr[this.locale]
  return format(date, formatStr, { locale: this.locale })
}

I think both approaches have pros/cons and I haven't decided yet which one to use.

Code-splitting

I've just had a discussion via email with one user about this. I'm thinking about webpack plugin, which could generate i18n files for each chunk automatically. I haven't figure out how to load it automatically, but the route manifest that you've posted might solve it as well.


Just flushing some ideas I've been playing with in past weeks :)

@thisguychris suggested exactly I want to see in i18l system!

And one more advantage — using t `Hello ${name}` makes the component more reusable. It is the simplest way to make i18n-ready component. Developer will not care about distributing component with lang json file(or may include it, when there are ready translations).

Perhaps, autogenerated json structure may have the structure of nested components:

{
	"App": {
	  	"My app":"Моё приложение",
	},
	"App.NestedComponent":{
	 	"Ave, ${name}!": "Славься ${name}!"
	},
	"App.AnotherNestedComponent":{
	 	"Ave, ${name}!": "Да здравствует ${name}!"
	}
}

It will encapsulate the phrases in its components. Useful for cases when same phrase may have various translations in different contexts.

I wanted to chime in just to say that all the issues mentioned in the original issue above, I am experiencing this on my latest project. It's a web version of a mobile app, needs to support 19 languages at launch and is completely api driven.

I was delighted to hear that this is being considered in sapper!

Thanks @thisguychris, @tricoder42 — Lingui is incredibly impressive. The thought and care that has gone into the tooling is amazing.

I've been thinking more about strings versus keys, and I'm coming down on the side of keys, for a couple of different reasons. (For clarity, I'm not suggesting you_have and message over You have %n messages, but rather you_have_n_messages.)

Firstly, string-based approaches typically end up with the source language text included in the production build ("Hello world!":"Salut le monde!"). In theory, with a key-based approach, {t.hello_world} could even reference a variable (as opposed to an object property) if t is a module namespace, which is inherently minifiable. Even if we couldn't pull that off, property names will generally be smaller (welcome_back as opposed to "Good to see you again!"). You could eliminate source language text with a sufficiently sophisticated build step, but not without adding complexity.

Secondly, and perhaps more importantly, I worry about requiring developers to be copywriters. Imagine you have a situation like this...

<p>{t`You have no e-mails`}</p>

...and someone points out that we don't hyphenate 'emails' any more — all of a sudden the keys for your other translations are out of date, so you have to go and fix them.

Then a copywriter comes along and says that it's the wrong tone of voice for our brand, and should be this instead:

<p>{t`Hooray! Inbox zero, baby!`}</p>

Of course, that text should also eventually be translated for other languages, but by putting that text in the source code don't we decrease the stability of the overall system?

Slots

The slots feature is very cool. Unfortunately it doesn't really translate (pun not intended) to Svelte, since you can't pass elements and component instances around as values. The closest equivalent I can think of to this...

<p>
   <Trans>
      See all <Link to="/unread">unread messages</Link>{" or "}
      <a onClick={markAsRead}>mark them</a> as read.
   </Trans>
</p>

...is this:

<p>
  {#each t.handle_messages as part}
    {#if part.i === 0}<a href="/unread">part.text</a>
    {:elseif part.i === 1}<button on:click={mark_as_read}>{part.text}</button>
    {:else}{part.text}{/if}
  {/each}
</p>

That assumes that t.handle_messages is a (generated) array like this:

[
  { text: 'See all ' },
  { text: 'unread messages', i: 0 },
  { text: ' or ' },
  { text: 'mark them', i: 1 },
  { text: ' as read.' }
]

Obviously that's much less elegant and harder to work with, but maybe that's a rare enough case that it's ok not to optimise for? We can pay for the loss of elegance in other places.

Currency and date formatting

I hadn't actually realised until today that Intl is supported basically everywhere that matters. For some reason I thought it was a new enough feature that you still needed bulky polyfills.

Distributed components

@AlexxNB that's a very interesting case that I hadn't considered. I think it changes the nature of the problem though — since t doesn't have any meaning to Svelte per se (so far, we've been talking about adding the feature to Sapper) we would have to add a new primitive. Maybe it's something like this, similar to the special @html tag:

<p>{@t hello_world}</p>

But that opens a can of worms, since Svelte now has to have opinions about i18n, which inevitably leads to opinions about project folder structure etc. I think it's probably more practical if components simply expose an interface for passing in translations:

<script>
  import VolumeSlider from '@some-ui-kit/svelte-volume-slider';
  import { t } from '@sapper/app'; // or whatever

  let volume = 0.5;

  const translations = {
    mute: t.mute
  };
</script>

<VolumeSlider bind:volume {translations}/>

I think we want to avoid referencing component filenames in translation files, since it's not uncommon to move components around a codebase.

Secondly, and perhaps more importantly, I worry about requiring developers to be copywriters.

Image another case: when developer changed the text value of any key in the en.json(main language of the app - the source of truth for all translators). Translators even can't to know about this fact. They haven't any built-in tool for actualizing their translations, except looking for diffs on github.
But using strings, instead keys you can make something like this:

sapper --i18n-check ru.json

And as result you can get that some phrases was gone, and some new phrases added.

My two cents on language detection: how about some bootstrap function where the developer can do whatever he wants to detect language and return the result? This way it could analyze URL path, subdomain, cookies, whatever.. less opinionated but still very simple

Since Sapper/Svelte is a compiler, what about using a single file for all the locales:

// locales.json
{
  "@metadata": {
    "authors": {
      "en": ["Lancelot"],
      "fr": ["Galahad"],
    }  
},
  "quest": {
    "en": "To seek the Holy Grail!",
    "fr": "Chercher le Saint Graal !",
  },
  "favoriteColour": {
    "en": "Blue.",
    "fr": "Bleu. Non !"
  }
}

and letting Sapper generate the respective locale files:

// en.json
{
  "@metadata": {
    "authors": ["Lancelot"]
},
  "quest": "To seek the Holy Grail!",
  "favoriteColour": "Blue."
}
// fr.json
{
  "@metadata": {
    "authors": ["Galahad"]
    }  
},
  "quest": "Chercher le Saint Graal !",
  "favoriteColour": "Bleu. Non !"
}

This way maintaining keys/values would be much easier in a single file than across several (19?!) files, don't you think? Just my $0.02…

If the format is compatible with the format output by tools like https://lingohub.com/ (which outputs in a format similar to what @laurentpayot has suggested), that'd be excellent.

@laurentpayot but how would one add a specific language easily? The format is great, but cumbersome to add/remove languages because it means traversing the single file.

This could be solved (altough not ideally) if every sentence/word had a key/number associated. Then it would be easy to see them in that format, but stored in separate files. The "main" language (to with dev is familiar to) would dictate those keys. Any file missing them or having extra ones would be "wrong"

@khullah Do you mean when several translators are involved and working together? If that's what you mean then I agree it can be cumbersome.
Removing a language from a centralized file is as simple as sed '/"fr":/d' locales.json if there is one translation per line.
I don't know for other people but at least for me modifiying, adding and deleting keys occurs much more often than adding/deleting a whole language.

I really like @laurentpayot's idea. Bear in mind this can also be augmented with tooling — as long as there's a well-understood format and folder structure, you could create an interface for adding and removing languages, as well as translating specific keys (and keeping track of which ones were incomplete, etc). It could even be built in to Sapper's CLI!

While I'm here: had a good chat with @thisguychris the other day about authoring, in which he challenged my stance that we should use keys (as opposed to source language strings) for this. He likened it to BEM, having to have a naming structure for stuff that's maintained in a file to which you're tightly coupled at a distance.

I think there's merit to that claim. So while I do think that keys have some important advantages...

  • much easier to keep control of bundle sizes without convoluted tooling
  • easier to understand 'what's going on' wrt the underlying mechanics with {t.hello(name)} over {t.`Hello ${name}\!`}
  • possibility of disambiguating between translations that are context-dependent in some languages, but not in the source language
  • stability, since fixing typos doesn't invalidate translations
  • overall structure and organisation may be preferable to some

...it's true that in cases where you get the translations before you build the app, using strings might be preferable. So I guess I still lean towards keys, but my position isn't set in stone.

Re disambiguation — just reading this piece that was doing the rounds today which contains a great example of a situation where using a source language string as a key will result in suboptimal translations:

An example that I love to use is the term “Get started.” We use that in our products in a lot of places and, in American English, it’s pretty standard. It’s so understandable that people don’t even think of the fact that it can be used in three or four ways. It could be a call to action on a button. Like, “Get started. Click here.” It could be the title of the page that’s showing how you get started. It can be the name of a file: a Get Started guide PDF. All of those instances need to be translated differently in most other languages.

Localised URLs

/fr/foo

I think this is the best option, because:

  1. It will be convenient for the export function, because for each language on the server will be a separate folder with unique localized HTMLs.
    1.1. It will work without additional JavaScript (it is important for SEO), and without NodeJs
  2. Search engine bots will see the unique content of each URL.

Yes @laurentpayot, that's what i've meant, but not only that. It would be difficult to have some sort of phrasing dictionary from other projects to import from, which would be a great thing. I think removing a language occurs less then adding one.

That beeing said, it does help human translators to see and understand context, provide a tasklist, help mantain all langs in sync, etc, as mentioned by @Rich-Harris . And this is actually something I would thrive for - promote whatever is better for the devs (compilation capabilities should be explored at maximum, it is the distinguishing feature from all other frameworks after all).

Actually.. just realized that would not be hard to take someLanguage.dictionary.json and pre-fill in that format as well, since keys are kinda like nickames to each phrasing. "Hello" would be filled with a default translation, which translators could later adapt if necessary for the given project.

Even more, several files could provide better context + modularization:

// greetings or home or xComponent.i18n.json
{
  "hello": {
     "en": "Hello!",
  ...
}

// yComponent.i18n.json
{
  "message": {
     "en": "some message",
  },
  "variants": {
     "en": ["some message","same message!!","Hey, another message"]  
  },
  ...
}

So yeah, I like your format :)
I wouldn't even compile to all '19' files, just leave as is. A single i18n file per component/module/context. How it will be loaded onto the app doesn't matter to me, as long as it works.

note: l10n of currency, numbers and dates would be in yet another (global) file (if needed, since there is moment.js etc)

// en.l10n.json — input
{
  "number": { ... }
  "date": {
    "short": "",
  },
  "currency": "U$ #,##0.00"
}

@Rich-Harris

<p>{t.hello(name)}</p> seems fine to me and goes pretty well with the above format

The slots feature is very cool

Yeap. Way better than the second example you gave. Didn't catch why it isn't simple to do?

Didn't catch why it isn't simple to do?

It's just a difference between how React and Svelte work. In React, elements are just variables — you can do this sort of thing...

var element = <p>some text</p>;
return <div>{element}</div>;

and by extension, you can transform the <Trans> component in Lingui into something that can move chunks of virtual DOM around at runtime depending on word order.

In Svelte, everything is precompiled. The assumption (which holds for 99% of cases, but not this case) is that the structure of your application can be known at build time. Using that, it can generate code that starts and updates much faster than is possible with a virtual DOM; the lack of support for Lingui-style slots is the tradeoff.

saabi commented

It seems nobody mentioned URLs were originally intended (if I recall correctly) to serve as a single source for a particular piece of information, independent of presentation. That way, they could either present the information resource in English, Klingon, JSON, binary or whatever, depending on the HTTP negotiation.

Nobody does this nowadays, for good practical reasons (which also depend on available technology, which could change), but it was the original intent. And though I may be biased, because the purist in me likes the theoretical elegance, I think the option should be left open for that.

Also, the language selection mechanism should be selectable itself. We should be able to configure, or customize, how Sapper determines the language.

Localized URLs.

I like the idea, but keeping in sync with what I said before, THEORETICALLY, there should be a canonical URL that can present data in any language, also including machine readable ones, and then you can have alternative localized URLs to the same resource, which may suggest a presentational preference for its content.

For example...

  • canonical: my.site/some/resource -> can present in any format (English, JSON, French, etc, depending on HTTP neogtiation or other Sapper configurable selection mechanism)
  • JSON: my.site/api/some/resource or json.my.site/some/resource (configurable)
  • French: my.site/fr/une/resource or fr.my.site/une/resource or my.site/une/resource (also configurable..)
    etc. ...

Anyway, all I'm saying is we should keep that flexibility.

EDIT:
In other words, it's recommended (by the designers) that the URL -E> Resource relation is many to one rather than the inverse. I'll go and find a reference anyway, tomorrow.

And then again, it's OK to think of the same information in another language as a separate resource.

Hello there! I'm a member of the Angular team, and I work on i18n there. I thought that I could share some of my knowledge to help you get started:

  • if you can avoid to touch date/currencies/numbers and use intl instead, it's better. Dealing with those is a major pain, you'll discover new traps every day: people that don't use the Gregorian calendar, left to right languages, different number systems (arabic or hindu for example), ... For Angular we decided to drop intl because of browser inconsistencies. Most modern browser have a good intl support, but if you need to support older browser then you'll have bugs and differences. In retrospect, sticking with intl might have been a better choice...
  • all major vendors (IBM, oracle, google, apple, ...) use CLDR data as the source of truth: http://cldr.unicode.org/. They export their data in xml or json (https://github.com/unicode-cldr). We use the npm modules "cldrjs" and "cldr-data-downloader" (https://github.com/rxaviers/cldrjs) developed initially for jquery globalize to access the CLDR json data. We also use "cldr" (https://github.com/papandreou/node-cldr) to extract the plural rules. You can find our extraction scripts here: https://github.com/angular/angular/tree/master/tools/gulp-tasks/cldr if you want to take a look at it.
  • if you can, use a recognized format for your translations so that you users can use existing translation software. One of the main formats is XLIFF but it uses XML which is very complicated to read/write in js. Stick to JSON if you can. There are a few existing JSON formats that are supported by tools, you should research the existing ones and choose one of them, it'll make the life of your users so much easier, and you will be able to reuse some external libraries. Some examples are i18next JSON https://www.i18next.com/misc/json-format or Google ARB https://github.com/googlei18n/app-resource-bundle/wiki/ApplicationResourceBundleSpecification. Don't try to reinvent the wheel here.
  • For plural rules, use CLDR data http://cldr.unicode.org/index/cldr-spec/plural-rules
  • ICU expressions are a nice way to deal with plurals, ordinals, selects (gender), ... but there is no documentation for js... you can read a bit here: http://userguide.icu-project.org/formatparse/messages and on the angular docs https://angular.io/guide/i18n#regular-expressions-for-plurals-and-selections
  • you need to follow a rule for locale identifiers. I recommend BCP47 which is what CLDR uses with a few optimizations (http://cldr.unicode.org/core-spec#Unicode_Language_and_Locale_Identifiers), some doc to help you pick the right identifier: http://cldr.unicode.org/index/cldr-spec/picking-the-right-language-code
  • id or non-id based keys: use either auto generated ids (with a hashing/digest algorithm) or manual id (keys that the user specifies). Never use the sentences as keys because you'll run into problems with your json and some special characters, you'll get very long keys which will increase the size of the json files and make them hard to read, and you'll get duplicates (the same text with different meanings depending on the context), which brings me to my next point...
  • you need to support optional descriptions and meanings, those are very important for translators. Descriptions are just some text that explains what this text is, while meaning is what the translators should use to understand how to translate this text depending on the context of the page and what this text represents. The meaning should be used to generate the ids (keys) so that you don't have duplicates with different meanings.

Hearing about Angular's approach gives me some more thoughts about previous and current use cases for this (since I've only had a critical use-case translation in AngularJS, at my previous company). Maybe it's useful when considering options, maybe it isn't:

There are a number of ways that translation might need to exist:

  1. As some sort of dynamic option on the UI, where you can switch languages on the fly within the same bundle.
  2. A thing which switches the bundle you are using.
  3. To build separate bundles - we wanted to launch my previous product in 37 countries, but each country would only require the site in their language *
  • a curveball to use-case 3 is that some countries like Finland et-al require multiple languages, Swedish, Finnish, and English. Switzerland is similar. Most countries want English too, and some countries just existed as their own.

I think that localised urls are a nice thing to have, for use-case 2 and 3. I'm not sure of the net effect on use-case 1. I generally think they wouldn't be used though, for the reasons of canonical urls mentioned above.

I'm concerned that enforcing a locale-based url structure would be a problem - certainly SEO people are very specific about URL structures and may not want the language embedded in the URL. It would certainly be a deal breaker for us.

you need to support optional descriptions and meanings, those are very important for translators. Descriptions are just some text that explains what this text is, while meaning is what the translators should use to understand how to translate this text depending on the context of the page and what this text represents. The meaning should be used to generate the ids (keys) so that you don't have duplicates with different meanings.

Very true. Once again as Sapper/Svelte is a compiler, we could use a unique JavaScript file instead of a JSON one to have useful comments. And Sapper would generate the appropriate .json files.

// locales.js
export default {
  "@metadata": {
    "authors": {
      "en": ["Lancelot"],
      "fr": ["Galahad"],
    }  
  },
  // Answer to the Bridgekeeper's first question
  "quest": {
    "en": "To seek the Holy Grail!",
    "fr": "Chercher le Saint Graal !",
  },
  // Answer to the Bridgekeeper's second question
  "favoriteColour": {
    "en": "Blue.",
    "fr": "Bleu. Non !"
  }
}

using a source language string as a key will result in suboptimal translations

In my opinion, it is a good example for prefering using strings instead keys. A Developer will create only one key get_started, and will use the key in all occurrences. Because he doesn't know about various translation in differerent languages.
When using strings, compiler will add same string to en.json three times and developer doesn't need to care about it. We should only to think about encapsulation and context determination of that strings.

we could use a unique JavaScript file

It is nice proposal. In some cases we will be able to use very custom logic in plurals, formatting or somewhere else.

I believe the implementation provided by ttag is useful as Prior Art. It already works with Svelte v3 out of the box, so one can validate / discredit some ideas or opinions without having to develop the thing first.

  • Uses JS string interpolation for simple cases: t`Hello ${ name }`, ngettext for complex forms
  • Using a babel plugin, it can output one build per language, completely eliminating the source language from the output build and just leaving mock implementations for its own functions.
  • Input files are in gettext format which, while not necessarily the friendliest, is certainly an industry standard with large amounts of tooling available.

id or non-id based keys

This may not be a debate that has to happen. Fundamentally, the only difference between t`Welcome to our website!` and t`welcome_message` is whether the master language is set to English or to a made up id language, with translations provided for English. Any string based solution would want to support arbitrary languages as the master, not just assume English, anyways.

Thanks for sharing @ocombe!

  • 3 years ago, we dropped Intl and use either an external library or we rolled our own. There was so much inconsistencies with the browser, even if you try to adhere to CLDR, since each of them have their own implementation. Even on Node.js, there were inconsistencies we found on each version for example, Node.js 8 would include a space: Inconsistent To battle this inconsistencies, we opted to generate it server-side instead. But reflecting back, dropping Intl was the wise decision for that time. If you are to support just evergreen browsers, then it's better to leverage Intl as it's a browser api and no more bytes needed in order to utilize.

  • for the id's I think the key here is having an option for the user to auto generate these ids in some form or another. We should learn from BEM where we try to force the developer to adhere to a strict convention. It's always better to make the computer do the work of such convention, that way we eliminate human error. Keys using hashing would be better I guess to address translations blowing up twice.

  • for translation software, I think .po/.pot is a widely use format for globalization. You can leverage a lot of translation service since they support this kind of format.

I shall propose a concrete course of action.

  1. Don’t block Sapper 1.0 on this. Getting this right will take quite some time, and I see no reason at all why it should block it—I don’t even think it’ll be backwards-incompatible, not that requiring 2.0.0 for introducing localisation would be a catastrophe anyway. (I’m also not sure that strong consensus will be reached. I suspect that this may end up worthy of living outside Sapper itself, and I think it can be made to work so without much difficulty, at the cost of about 5 lines of boilerplate.)
  2. Start using Project Fluent via the fluent package. It’s driven by Mozilla, who are basically the best at these matters. It’s ICU-compatible, and ties in with Intl stuff nicely. All of the Fluent.js packages look to be surprisingly compact (~6.7KB minified + gzipped for fluent and fluent-intl-polyfill together).
  3. Don’t use stores or reactivity for localisation: switching locale is extremely rare, and throwing away the entire DOM and redrawing it is entirely acceptable. Even doing a full page reload is acceptable. If it’s made reactive, the code generated is more complex (i.e. code slow-down and increased memory usage).
  4. Use fluent-langneg for initial language negotiation, but ensure that there are suitable hooks for overriding language negotiation (e.g. something from the user’s session). I’m sidestepping the matter of putting the locale in the URL, that’s been discussed a fair bit in this thread.
  5. Do something about locale fallback. fluent-langneg will give hints about what directions to fall back in (most likely things like en-au to en-us; but craziest scenario, you could cause it to try French, then fall back to German, then to English!), but I guess most will want to define which locale is the base locale (normally en or en-us). It may also turn out to be wiser to compile locale fallback in so that you only need to consult one bundle at runtime. Worthwhile discussing this with the Fluent people.
  6. Include fluent-intl-polyfill by default, but probably provide a way of turning it off so that it can be more selectively included (Chrome 63+ and Firefox 58+ don’t need it).
  7. See what happens in IE<11. Does it fail? Does it just stringify numbers and dates without regard to locale (since the formatters are missing)? Do we care? If it’s broken, probably talk to the Fluent people if it can be changed to just stringify naively as that’ll be generally close enough.
  8. Think about adding extra functions for other purposes that people have asked for such as creating DOM nodes. I can imagine { A($text, href: $href) } working. (Remember that this is FTL’s $, not Svelte’s $!) I’m not actually certain whether this will work in Fluent; when it doesn’t, talk over the issue with the Fluent people.
  9. Then, when all that’s working, experiment with implementing a compiler based upon fluent-syntax, so that you will be able to import an FTL file and obtain a module out of it that exports many strings and functions. This will not be as easy as it may initially seem due to the vagaries of such things as language cases and attributes on terms, but I haven’t seen anything that looks like it’ll be a blocker. This stage will yield a nicer interface, because in JavaScript terms it will no longer be currentLocaleBundle.format('hello', { name: 'world' }), but rather currentLocaleModule.hello({ name: 'world' }). (Open question: turn kebab-case into snake_case or camelCase?) This work should end up in a separate package that Sapper will use. (Sapper’s scope should be limited to being glue.) Talk about all this with Fluent people, I suspect there may be others interested in it as well, perhaps even in implementing it, in part because of how it lets you shift most errors to compile time rather than runtime. Oh yeah, it should be producing TypeScript (or at least .ts.d) so that named arguments can be checked.
  10. Only then, with all that done, contemplate further potential optimisations like splitting the PluralRules polyfill up into the locale modules (possibly worthwhile), and generating different versions of components for each locale and inlining strings (unlikely to be worthwhile).

I picture the first stage of this work being used via import t from '@sapper/locale'; and {t('hello', { name: 'world' })}; I don’t think it warrants sugar like {#t 'hello' name='world'} or whatever one might decide on: better for it to be simple JavaScript. If precompilation of the FTL resources was then done, it could become {t.hello({ name: 'world' })}. You might or might not be able to make import { hello } from '@sapper/locale'; work, it would depend on a few other technical choices.

For talking with Fluent people, see https://discourse.mozilla.org/c/fluent. It’s probably a good idea to ask them for input on this issue about now anyway.

All of the Fluent.js packages look to be surprisingly compact (~6.7KB minified + gzipped for fluent and fluent-intl-polyfill together).

That is heavy by Svelte's standard.
I would rather have a less sophisticated but super-light i18n system than a do-it-all but heavier one. Performance is why I chose Svelte over Vue.

A super-light i18n system just doesn’t cope with real translation work. This is a simple fact of life. Languages are messy. If you go for a super-light system, it will all end in tears and people will learn to avoid it, and the last state of Sapper will be worse than the first.

As it stands, you’ve neglected steps 8 and 9 of my proposal. The point I was going for when mentioning Fluent’s compactness is this: 6.7KB really isn’t that bad, and is an acceptable intermediate state, for a useful result. (Other decent localisation libraries that I’ve dealt with have been more like 15–30KB.)

Let not the perfect be the enemy of the good. (And definitely don’t settle for terrible just because the good is not perfect!) Get something good working, so that you can confirm that it is in fact good (and so that intrepid users can even start using it), then continue to improve it until it is at last perfect. I provided the roadmap to accomplishing that, including the expectation that the API would change once compilation was implemented, from t('foo') to t.foo.

I might as well elucidate a bit more on the savings that I expect to be accomplished.

FTL file compilation, if successful, will get it down a long way. 3.5KB of the cited weight there is from fluent, and the substantial majority of it will shed away—I estimate eventual constant overhead of a few hundred bytes only. (Do recall that your language strings will now be a bit larger; but in practice I think it’ll be at least hundreds of strings before you make up the difference.)

Similarly for the Intl.PluralRules polyfill, something like 2KB will promptly disappear if it’s customised to only support the one locale at a time, and there is easy scope for further optimisation by rebuilding it from the ground up against the same data, inlining the rules rather than looking them up; a quick skim through the code involved leads me to expect it to be as little as a couple of hundred bytes (perhaps even a tad less if we decide it’s safe to not even pretend to be Intl.PluralRules), and a very little more for legacy browsers (IE<11).

In total, I think we’re looking at eventual overhead of about half a kilobyte with this approach, and by that point it’ll also be clearer where we might be able to cut further corners.

@chris-morgan

A super-light i18n system just doesn’t cope with real translation work. This is a simple fact of life.

A super-light i18n system which can handle 95% of the cases + some manual handling of the edge cases can cope with real translation work because I dit it before, and this is a simple fact of my life.

If there is a need to delegate all the translation work then indeed a complete i18n solution is to be preferred, but developers should have the choice of what solution to pick and I won't try to peremptorily impose one into Sapper.
I wish Sapper can come with an opt-in light i18n system and let developers use whatever i18n library they want if they need a heavier translation system than the default one.

I wonder if some of the i18n complexity can be handled server-side, to keep the downloaded code light without losing correctness. If possible, this would take advantage of Sapper’s architecture in a Node-based server is available at runtime. I think @chris-morgan 's comment was already in this direction, with the idea of downloading the code only for the locale currently in use. Downloading more code or even refreshing the whole page seems completely acceptable for a language change.

I think server side as the point of localization is a very smart compromise.

As to @chris-morgan’s point about not blocking v1, I kind of agree. I think localization is a huge need, but as a practical matter, there’s so many hurdles it shouldn’t get in the way of other priorities for a solid v1 release.

I haven’t read the whole discussion here, but let me just weigh in quickly.
(FYI, I maintain https://github.com/eversport/intl-codegen)

Precompile Translations

IMO, one of the great selling points of svelte is that it does what it does at compile time, with minimal runtime overhead, which I think any i18n solution should also be doing.
Thats what intl-codegen, as well as other solutions mentioned here such as linguijs also do. I try to compile the ICU MessageFormat into straight up JS code. Supporting fluent syntax at some point is on the roadmap, but definitely not a priority right now.

DOM Overlays

I also think that a svelte i18n solution should be very qualified to do a feature such as DOM Overlays, which is also supported by liguijs (as mentioned by @thisguychris https://github.com/sveltejs/sapper/issues/576#issuecomment-466731513) and also in fluent.js (both DOM and React, though their implementations slightly differ).
I also have plans to integrate react overlays into intl-codegen: eversport/intl-codegen#15
My plan is that this feature will look similar to this (in react code):

// Translation in MessageFormat syntax, or in Fluent syntax in the future:
// `I accept the <terms title="translated properties…">terms and conditions</terms>…`
function render() {
  return <Localized id="foo"><Link key="terms" href={whatever} /></Localized>
}

I would love this stuff to also work with svelte at some point, because well why not :-)

If the goal is build-time l10n we can consider of supporting babel-macros (https://github.com/jgierer12/awesome-babel-macros) and macros like https://www.npmjs.com/package/tagged-translations.

From performance and bundle-size point of view, most optimal way IMO is to define list of supported locales on application level ( const locales = ['en', 'ru']) and to generate separate builds for each locale (__sapper__/en/, __sapper__/ru/). A server should get from locale context (/ru/path, /path/?lang=ru, cookie, http header, etc) and return js files for correct language or 'en' by default). All static translation will be resolved server-side and you will only need to support singular/plural cases on client with code like

import t from 'tagged-translations/macro'
import {getMessage} from 'generic-i18n-lib'

<p>{getMessage(count, t`You have ${count} apples`, t`You have a Mac`)}

That can be compiled to:

import {getMessage} from 'generic-i18n-lib'

<p>{getMessage(count, `У вас есть ${count} яблок`, 'У вас есть Mac')}</p>

A project can use getMessage from any i18n library, Sapper should only convert static strings and generate bundles for all supported locales.
And supporting of build-time babel-macros looks like a nice companion for compiled Svelte anyway.

@Rich-Harris are you still going to squeeze this with Sapper's initial release with svelte v3?

Regarding Fluent, it seems like there is some work done by @eemeli to create a compiler that reduces the run-time size (currently less than 1kB according to the readme).

There is also discussion on the Fluent forums about merging this to Fluent core.

Maybe putting some effort into bringing that project fully to life would create the possiblity of having a fully-featured i18n solution with a similar aim of Svelte.

Heya. Haven't read the whole backlog here, but tossing in a few opinions nevertheless:

  1. Highly recommend against coming up with your own custom i18n/l10n language. If you need any sort of variable formatting or value-based selection, go with either ICU MessageFormat or Fluent. If you need it to work right now rather than in some number of months, pick MessageFormat -- the Fluent spec has reached 1.0, but the JS implementation's API is still developing. If you're more interested in the long term, pick Fluent -- its spec is much friendlier for both developers and translators, and it enforces a file format.
  2. Make sure that your message parsing happens during your build, rather than needing to do it at runtime. This is already possible at least with messageformat, and as mentioned above I'm working on fluent-compiler.
  3. If you pick MessageFormat for now, by the time it's of interest to you, I'll have a messageformat-to-fluent translator ready. Also, for sanity use YAML or TOML as your file format, rather than JSON.
  4. fluent-intl-polyfill is getting deprecated, and really it's just a wrapper for an older version of my intl-pluralrules. Which unfortunately is still required for full support even in the latest browsers. If you really need to squeeze out every bit and byte, fork its repo and use make-plural to build your own category selectors for some subset of all the Unicode CLDR languages.

I think is worth mentioning the facebook library for handling i18n, since they need to handle more than 100+ languages, they created the library to face all the i18n challenges, I think is a good reference.

  1. video presentation

  2. fbt repo

  3. fbt website

It seems that everyone in this thread is trying to make Sapper i18n opinionated with x or y translation library.
To make everyone happy the best option would be to let developers have the choice, by finding a way to plug-in whatever translation function and its associated locale files.

I want to give my opinion about how to handle syntax from different languages. Instead of pulling a translated string into a logically variable text expression, I think that the best choice is to send the state/data variables into the translated string which will contain a specific grammar/syntax and use the injected variables.
Example
instead of doing this :
<p>{t.you_have}{num}{num === 1 ? t.apple : t.apples}</p>
doing that :
<p>{t.you_have_n_apples}</p>
have have in the json :
{"you_have_n_apples","you have {num} {num === 1 ? 'apple' : 'apples'"}

With this alternative the texts are described in the translation in each language separately so you can simply change the grammar as you need and keep a clean UI code.
This option imply having code in the translation, which could rebuke some translator, maybe a specific simple syntax can be tailor made or already exist for the plurals for instance?
For this to work svelte has to evaluate the string twice to replace the variables with their value after the first evaluation that pull the translation, like a continuous expression evaluation where each content pulled containing some variable instances will continue evaluating until the content is static. The scope should already be on point for this

on more thing, I think that the runtime option is great, because the cost is not that high, when you switch language, it's a lot faster to redraw the full document than to reload the page, nobody will switch language 60 times per second, not even often, but more and more people are multilingual and the constant reloading of the sites annoy them. Especially if for some reason the inputs or the state were not saved and the new page looks different than what they had before switching.
I think that svelte should have everything builtin and well integrated in its philosophy, it's a must in any application, nobody should ever write a displayed string in a .svelte file. With some IDE magic thing can become incredibly simple and agile.

It seems that everyone in this thread is trying to make Sapper i18n opinionated with x or y translation library.

It's a result of i18n being one of those things that absolutely requires collective effort. Using something that's been accumulating such effort over longer periods of time seems like a good idea to me. Even if you say "I've had 95% success rate with the tools I've been using before!" that could only be because you did not encounter any languages where the success rate would drop to 30%.

To make everyone happy the best option would be to let developers have the choice, by finding a way to plug-in whatever translation function and its associated locale files.

That sounds great as long as there is also an official plug-in all juiced up and ready to go. I'm pretty sure the ease of integration will soon defeat the need for one little feature or another.

I suggest you having a look at how js-lingui works with React. I have tried many different approaches in my old projects and lingui is absolutely the best.

https://github.com/lingui/js-lingui

@NSLS Thank you!

I just want to mention that LinguiJS is in transition state - I'm working (once again) on major release. There're lot of obstacles in current approach.

Recently I've learned about Svelte and I really like the philosophy. Once I finish LinguiJS v3, I would like to take a look how to integrate it into Svelte.

I created a minimal example of implementing i18n in Sapper app:

I used i18next because it has good docs and can be integrated easily. My first attempt was to use LinguiJS but I got into issues with rollup throwing errors when importing LinguiJS macros.

There are still issues with my implementation:

  • large bundle size because of using i18next
  • routing conflicts with services workers, therefore it fails on npm run export

I would be glad to receive any pull requests and proposals to improve the example.

For simple plural languages such as English or French you can have an i18n function with internal references (i.e. it can reuse translations) in 20 LOC only.

Extremely new here. Just heard about Svelte today, but i18n would be important for adoption of Svelte in many large scale web apps/companies, so I thought I'd chime in since this doesn't seem totally solved/agreed upon yet. There are many things I don't like about Angular, but the syntactic way Angular handles i18n is nice, both in terms of readability of the HTML and also maintainability. You define your string in HTML as your normally would:
<h1>This is my message</h1>

But, then you provide the i18n directive, which will choose the correct language at page load (The user only gets the translated bundle for their locale at page load):
<h1 i18n>This is my message</h1>

You can, of course, use a variable for the string, when necessary. Having the text in the same file is convenient and may make finding random typos more likely.

It looks like @dogada's suggestion is the closest to this so far, and I would be ok with that approach. Although, having to call getMessage() seems unnecessary since Svelte is a compiler?

@skaiser getMessage is required when your translation depends on an argument, for example "You have 1 apple only" vs "There are 6 apples"

@dmitrykrylov your approach works but I personally don't like to use [lang] in routes because it's not always possible to change url structure and many people prefer to have /about/ instead of /en/about for the default English language.

Just to add to many other options mentioned here, there is FormatJS and it is based on ICU Message syntax and Unicode CLDR. Since Svelte is a compiler™ it probably can compile output of intl-messageformat-parser straight to functions.
FormatJS is also used in their own react-intl module.

pngwn commented

Closing some other i18n discussions from the past but linking them here for reference: sveltejs/sapper#78, sveltejs/sapper#230.

@Rich-Harris - we do a lot of localization today. We handle this in our headless CMS (contentful). We use / for english and /es/ for spanish.

We do use slugs a lot, for example /blog/[slug] or /medication/[id] so I think assuming we can always localize the slug would be a bad situation. I think sapper being optioned to use the FS would not be great.

We have bee using the /[lang]/ approach in our POC of Sapper. We want to move from Next.js/React to Sapper/Svelte, since our user base is 95% mobile, and in parts of the world with 14-20s latency (all US).

We are planning to use sapper export. We are currently attempting to see if we can use a --rewrite-path rule to move files at export while persisting functionality. But it seems we bake some paths into JS for requests that cause this idea to fail.

I agree it would be beneficial for a sapper to be opinionated for an official local file system version, but that is not maintainable at an enterprise without custom scripts. For example, I would have to write a locale file during the build with all the mapping. When my prefetch method already knows the local and the path to the content, it seems to be in the wrong direction.

Please note that using the /lang/ code in the url requires i18n but not the vice versa. The /lang/ code is needed for the multilingual sites, while i18n is necessary in any single-language site that was originally created with a different language in mind.

Our current site uses english at / and Spanish at /es, which is why this solution does not work in all cases either.

+1,000,000 for YAML for language content XD

I think Laravel solves the pluralization problem very nicely: https://laravel.com/docs/6.x/localization#pluralization

You can use a pipe for singular/plural:

'apples' => 'There is one apple|There are many apples',

And you can also specify an unlimited amount of variants based on the number:

'apples' => '{0} There are none|[1,19] There are some|[20,*] There are many',

@khrome83 Your use case is a multilingual website with content stored in a database, which is a good solution for that. For apps however it's perfectly normal to have translations of UI strings stored in files. Or a combination of both.

It's worth remembering that i18n covers a lot. It's not just about replacing UI strings or serving multilingual content. There's routing based on the visitor's preferred locale, graceful switching between locales, funky grammar to support (see Fluent), special number and date formatting, LTR/RTL.. Most websites may want the locale to be defined in the URL, while apps may want to keep it hidden in its internal state and instead allow hot reloading of locales. And I thought cache invalidation and naming things were hard :)

I don't think Sapper should try to tackle all of i18n. Most of this has already been solved by other great projects, and some will have to be custom coded per project. But some of this has to be provided by Sapper, the question is how much.

I agree with those who think Sapper shouldn't limit localization to one particular library, requirements will always vary from project to project. Instead, it could provide the infrastructure/plumbing for integrating with existing i18n projects, and let the community provide integrations to suit different needs. LinguiJS, fluent-compiler and other localization libraries that do compilation seem like a perfect match for Sapper though, so maybe they could be favoured.

I am more interesting in how to write code.

First, we should easily get all available locales from Sapper:

<script>
const { locales } from '@sapper/app';

function change_locale(locale) {
    ...
}
<script>

<ul>
    {#each locales as locale}
        <li on:click={change_locale(locale)}>{locale}</li>
    {/each}
</ul>

Second, think about that Sapper is a compiler, It gives us more imagination.

In traditional projects, we wrote code like this:

t('good.day');
t('welcome.user', username, param);
t('welcome.user2', {username: realname, param});

In Sapper, we may be could write more natural:

<p>{#i18n good.day }</p>
<p>{#i18n welcome.user | username, param }</p>
<p>{#i18n welcome.user2 | {username: realname, param} }</p>

Because Sapper is a compiler, It knows {#i18n ... } in html is a translation expression.

Even we could write default locale language:

<p>{#i18n 'Good day!' }</p>
<p>{#i18n 'Welcome $1!' | username, param }</p>
<p>{#i18n 'Welcome ${user}!' | {username: realname, param} }</p>

Sappre will compile these to locale file with metadata (where it used, origin sentence) for example like:

{
    "good_day": {
        "msg": "Good day!",
        "org": "Good day!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 1}
        ]
    },
    "welcome_$1": {
        "msg": "Welcome $1!",
        "org": "Welcome $1!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 2}
        ]
    },
    "welcome_$user": {
        "msg": "Welcome ${user}!",
        "org": "Welcome ${user}!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 3}
        ]
    }
}

We may also don't care about the keys:

{
    "hashedA": {
        "msg": "Good day!",
        "org": "Good day!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 1}
        ]
    },
    "hashedB": {
        "msg": "Welcome $1!",
        "org": "Welcome $1!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 2}
        ]
    },
    "hashdC": {
        "msg": "Welcome ${user}!",
        "org": "Welcome ${user}!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 3}
        ]
    }
}

@dishuostec It's unclear why you need hashed keys if original strings can be used as keys. Aslo I strongly suggest to avoid inventing new formats for message translations. There are standard PO files that a lot of editors support. I'm sure there are should be standard JSON formats as well and editors/services that support them.

@dogada

Aslo I strongly suggest to avoid inventing new formats for message translations. There are standard PO files that a lot of editors support. I'm sure there are should be standard JSON formats as well and editors/services that support them.

I use json for example because of we are familiar with it. I agree with you at this point, we should use widely used format like PO.

It's unclear why you need hashed keys if original strings can be used as keys.

As mentioned above, we keep original string in locale file, so we don't care about what the key is. It can be abstract to a result of key_hash_function, and may be the function is (string)=>string.

It's unclear why you need hashed keys if original strings can be used as keys.

I think the strongest argument for adding a hash (usually of component/path + string) is to prevent key collisions. Otherwise there is no way to differentiate generic strings that might need to be translated differently depending on the context.

ocombe, a member of the Angular team who works on i18n, gave this advice earlier:

Never use the sentences as keys because you'll run into problems with your json and some special characters, you'll get very long keys which will increase the size of the json files and make them hard to read, and you'll get duplicates (the same text with different meanings depending on the context)

I can also add that if you have a typo in your original string and later fix that, all translations will have to be "rekeyed". So I'm very much in favor of using keys like hello-user.

That said, I think this is one example of where Sapper could be agnostic and leave implementation details up to the integration, effectively supporting both.

Really interesting thread. It's full of great suggestions and I imagine the problem is going to be narrowing down all the possible solutions to something general that will work for most people and is extensible.

In that vein, it might be worthwhile to look at what other compiled frameworks/languages, outside of JS, do to address these issues? I can speak a bit about the language which I'm most familiar with, which is C# and the .NET Core framework.

.NET Core's out-of-the-box solution for string localization could be considered basic compared to some of the solutions suggested above:

  • Translations are stored in XML files (yes, XML) - one XML file per target language.
  • Each translation in these XML files (styled .resx or Resource files) contains a key, the translation, and an optional description/note for translators.
  • Keys are just strings; it's left to the developer to decide whether they want to use generalised keys (e.g. hello-user) or default/fallback-language strings (e.g. Greetings, {0}).
  • Translations are used by injecting (think importing) the localizer into the view and using the appropriate key, e.g. @L["hello-user"]. Keys/translations provide for simple string interpolation, so you can have @L["Greetings, {0}", user.Name]. If no translation is found in the resource files the view falls back to rendering the key.
  • .NET Core does provide very comprehensive localization/internationalization tools for currencies, dates, etc. I assume this is backed with CLDR data as discussed by others above. This is a very important piece of the puzzle.

As I said, this is basic. I'm not proposing it as the approach to take in Sapper. For example, I would much rather stick with a recognised translation file format such as those suggested above (and in JSON at that). But I think there is something to be learned here too: if you intend to please most of the people most of the time you can't be too opinionated and that makes it hard to implement too many advanced features. Remember, .NET is one of the most-widely used backend frameworks on the web and, as much as I complain about it sometimes, it clearly solves a lot of people's problems.

More info: https://docs.microsoft.com/en-us/aspnet/core/fundamentals/localization?view=aspnetcore-3.0

The other points to take away, though, are extensibility and taking advantage of compile-time optimizations.

The nice thing about the way localization is implemented in .NET Core is that it's very easy to extend or replace. There are libraries that allow you to use JSON or a database to store translations instead of .resx files. A few members of the Stack Overflow team have written up great blog posts where they detail a little about how they re-implemented string localization on SO:

https://m0sa.net/posts/2018-11-runtime-moonspeak/
https://nickcraver.com/blog/2016/05/03/stack-overflow-how-we-do-deployment-2016-edition/#step-3-finding-moonspeak-translation

Which leads on to the last point, being that one of the great advantages of Svelte/Sapper's approach is the kind of additional compile-time tasks and optimizations that could be achieved. For example: extracting a full list of keys without corresponding translations in the app. There might be potential here in the future for Sapper to do some cool things along these lines.

Anyway, I'm not pushing any particular agenda or approach here, just emphasising that there might be good lessons to be learned from solutions in other languages that have a compile step.

After taking a stab at building a first version of something along these lines using LinguiJS (available here) I came across this library: https://github.com/kaisermann/svelte-i18n

It seems to me that svelte-i18n does most of the things we discuss here, using an established format (ICU Message Format) on top of Svelte primitives.

There are still other issues to resolve around routing, tooling, etc, but svelte-i18n looks like a really promising start!

@laurentpayot I'm not sure. I think either way there might be additional changes needed in order for svelte-i18n to be usable in Sapper out of the box. The biggest concern for me is the use of module scope/a singelton architecture, as that will cause problems when server side rendering as soon as you introduce anything thats async.

But maybe that isn't an issue with Sapper in the same way as it is with say React, provided that the actual rendering is synchronous? I'm relatively new to Svelte and Sapper, so I don't know yet :)

A little late to the party, but since you guys mentioned svelte-i18n, I think I should give some updates about it. I first created that lib as a POC for my previous job and kinda abandoned the project for a while after that. I'm currently working on a v2.0.0 which add some new features and behaviours:

  • Async preloading of locale dictionaries (no partial dictionary support for now, trying to think of a non-verbose way of doing this 🤔);
  • Works with Sapper's SSR (work on progress here);
  • Provides a CLI to extract all message ids to a json in the stdout or specified output file;
  • Better number/date/time formatting (exposes the Intl.Formatters in a better way than the current version);
  • Custom formats for number/date/time. Formats are aliases to specific set of Intl.formatter options);
  • Exports a list of all locales for easy {#each}ing;

This is currently a WIP and I'm definitely taking in consideration a lot of what's said here. In no way I think I can handle every use case with just svelte-i18n. I've also thought about a preprocessor to remove verbosity of some cases, but I'm reluctant about that for now.

About creating a format specific for sapper/svelte: I'm not completely against it, but I think not using an established format is kind of reinventing the wheel. We already have great formats like ICU or Fluent, which already contemplate a bunch of quirks that a language can have.

Edit:

Ended up deciding to have a queue of loader methods for each locale:

register(locale, loader): adds a loader method to the locale queue;
waitLocale(): executes all loaders and merges the result with the current locale dictionary;

image

While not extremely ideal, the "verbosity" of this approach can be also reduced in the user-land by a preprocessor that adds those register and waitLocale calls, maybe even the format/_ method import.

Edit 2:

Just released v2.0.0 🎉 Here's a very crude sapper example: https://svelte-i18n.netlify.com/. You can check the network tab of your devtools too see how and when a locale messages are loaded. Hope it helps 😁

(A little late to the party too, but it's been a concern for a project of mine lately, so I came across this)

I'm coming from a region that speaks both french and english all the time, all the projects that I do requires some form of localisation. Through 10 years of moving from framework to framework, from cms to whatever ... There's always been the same things that annoyed me, there was never a perfect solution:

  • Always have a convoluted syntax or overloads that don't quite convey what it does unless you have handy documentation.
  • $_("") on a French international keyboard is annoying to type
  • i18n libs are usually heavy and do far too much for what most users need
  • It's near impossible to export a file or a csv for a translator to work with because it's full of weird syntax that they always find a way to screw with.
  • Importing a i18n lib in every-single-component is a pain. Especially in React where you also need to subscribe to a store with mapStateToPropsGodWhyIsThisFunctionNameSoLong
  • Don't get me started on .po files.
  • Managing fallbacks, missing locales, plurals, male/female for latin language is a pain.

I think the svelte philosophy is to not bring stuff your don't need, type less and do more.

There is something about the $_("localeName") syntax that always annoyed me, why not just use template litterals. ?

Wouldn't that be cool ?

Wouldn't it be nice to simply do $`This localised content would be {numTimes} better.` Javascript already gives us the possibility to parse litterals and do what we want with it. Why create a big function wrapper for something that is already in the language and that we don't need to import on top of each template files ?

$ Could be a subscribable default sapper store that contains methods that has preloaded locales from the hypotetical locales folder.

Or even better, with some svelte magic we could use just that $`this is the {jsFrameworkName} way` and make it global, so we don't have to go through the hassle of importing a store or a lib each time.

The current locale could be available through the prefetch function as well as a param:

async function prefetch (page) {
    const { locale, translations } = page;
}

Hence we are in the comfort of the svelte interpreter, we could easily extract the template litterals and automatically add the locale in a json file inside the locales folder for the locale currently in use in the html document head.

I also like the pipe operator in use in svelte, like on:click|preventDefault. Or the godsent class:active idea ? Familiar / frequent patterns in svelte always have a solution and that's awesome

Can we recycle that for translations ?

In templates

$`There are {numBoats} boat|plural:numBoats="s" in the sea`

Generated locale.json have a format that is sensibly exportable in csv format to be sent to translators easily, it would generate a singular and plural column:

"There are {numBoats} boat in the sea": {
    plural: "There are {numBoats} boats in the sea",
    singular: "There are {numBoats} boat in the sea"
}

You can also solve very complex french plural oddities with this:

$`{numBoats} bateau|plural:numBoats="x" vous attaqu|singular:numBoats="e"|plural:numBoats="ent"`

Template literals can return this so this is also a possibility

$`There are {numBoats} boat in the sea`.plural(numBoats)

That way you don't have to do weird syntax to handle every possible use case, you don't have to subscribe to a large-esque library that tries to convert every possible pronoun...

What do you guys think ? Worth exploring ?

As it turns out, the Angular team decided to use template string literals for its new i18n package (@angular/localize) that was just released with the v9 of Angular 😊

Hello,

To be honest, I didn't read all messages, but I found that Mozilla is currently working on a very interesting project for localization: https://projectfluent.org/ For what I read so far it looks very good. (And I also really think that JSON for localization is shit)

Looks like the syntax files (or the <ftl lang="fr"> tag) could be compiled ahead of time and dynamically loaded as needed. Kept in sync with <html lang="fr"> and replaced/parsed thanks to fluent-dom attributes.

@Rich-Harris I'm really interested in helping to get this working. Be that on the more simple latin based text, or the way more complex thing like Arabic.

If there is anything I can do with testing against the problems I know of localizing into 7 languages with a global audience (including RLT, double byte characters and other such annoyances, SEO urls), attempting to bash some code to help with this, or documenting how to do it including gotcha's that often hit the inexperienced. Please let me know.

Hey there ☆ I am new, excited, have a proposal towards what's ahead ↑.
(especially you @andykillen )
I'll really something like, think it should be good:

A- ROUTING -> wikipedia level

① real arbitrary routing system like prefix, like suffix

(PHILOSOPHICAL reason: UNIVERSALITY = EQUALITY between locales huh!?)
(equality in dignity)
(actually locale not proper term, locale towards what kind of truth!?):

en.my-app/blog
ja.my-app/ブログ
fr.my-app/blog
  • that is:
    redirects from folder tree structure like my-app/blog to ja.my-app/ブログ
    based on ja Accept-Language header.
  • that is:
    my-app/blog is not canonical,
    ja.my-app/ブログ is canonical.

② keeping things simple, serve .json objects like:

{
  "en": "blog",
  "ja": "ブログ",
  "fr": "blog"
}

③ made by a i18n.js file in a _helper folder.

Files and directories with a leading underscore do not create routes. This allows you to colocate helper modules and components with the routes that depend on them — for example you could have a file called src/routes/_helpers/datetime.js and it would not create a /_helpers/datetime route

(ok no need to tell, I already know it's just suffixes, but I really want prefixes ok?
(same thing for the structure of its json: its keys are too complexes, see above ↑ same statement)
(ok no need to tell, I already know it makes slugs
but I really want original expressions not simili ok!?
not less than beautiful in WIKIPEDIA https://ja.wikipedia.org/wiki/トマト
For all that, OUR example SHOULD be WIKIPEDIA (https://www.wikidata.org/wiki/Q177837#sitelinks-wikipedia)
(see correspondance between pages)
technically for non latin languages like ja or ar

import re
import urllib.request
url = "https://ja.wikipedia.org/wiki/" + right_part
html = urllib.request.urlopen(url).read()
html = html.decode('utf-8')
import urllib.parse
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
 
url = "http://ja.wikipedia.org/wiki/トマト"
regex = r'[^\x00-\x7F]'
matchedList = re.findall(regex, url)
for m in matchedList:
    url = url.replace(m, urllib.parse.quote_plus(m, encoding = "utf-8"))

(hey @andykillen double byte characters!)
(hey @Rich-Harris wouldn't that be cool huh!?)

④ passing context to page to create SEO metadata hreflang then so on...

according to plugin-intl:

  • locale: The language tag indentifying the language of the page.
  • canonical: The canonical link to the page, if the current one has not the canonical path itself, null otherwise. This is usefull to indicate the search engines which link should be registered for the index pages.
  • slug: This is the relative path of your page without any indication of the language. It should be written in the default language so that you can translate it (feature not implemented yet).
  • pathRegex: A regular expression containing your the slug for you to filter easily in GraphQL.

⑤ having a simple fallback method

1- an isPublished boolean per locale ressource
2- redirecting fallback
in case locale do not exist
(to a default language) (that would be the only role of the default language here ok?!)

B- TEMPLATING -> keep things simple, localization should not be a blackBox Headache

① keeping a common folder/page/whatever structure,

keeping things simple, with other files

② using serializer function,

serving strings, blocks, documents, or svgs according to json:

{
  "en": "I love tomatos",
  "ja": "私はトマトが好きです",
  "fr": "j'aime les tomates"
}

or

{
"en": [{
  "_type": "block",
  "_key": "da5f884c9804",
  "style": "normal",
  "children": [{
      "_type": "span",
      "_key": "da5f884c98040",
      "text": "Say hi to ",
      "marks": []
    },
    {
      "_type": "span",
      "_key": "da5f884c98041",
      "text": "Portable Text",
      "marks": [
        "strong",
        "<markDefId>"
      ]
    },
    {
      "_type": "span",
      "_key": "da5f884c98042",
      "text": ".",
      "marks": []
    }
  ],
  "markDefs": [{
    "_type": "link",
    "_key": "<markDefId>",
    "href": "https://www.portabletext.org"
  }]
}],
"ja": ...

this function is transforming a localeObject (eq: type localeString, localeBlock, localeWhatever ...)

function localize(value, languages) {
  if (Array.isArray(value)) {
    return value.map(v => localize(v, languages))
  } else if (typeof value == 'object') {
    if (/^locale[A-Z]/.test(value._type)) {
      const language = languages.find(lang => value[lang])
      return value[language]
    }
    
    return Object.keys(value).reduce((result, key) => {
      result[key] = localize(value[key], languages)
      return result
    }, {})
  }
  return value
}
  • templating on page.js, whatever.js ...
    with the idea that
    change from ja.my-app/ブログ > fr.my-app/blog
    would just change just the translated parts smoothly...
    with a resolver:
exports.createResolvers = ({ createResolvers }) => {
  createResolvers({
    LocaleString: {
      localized: {
        type: 'String!',
        resolve(source, args, context) {
          return source[context.code || args.code] || source['en']
        },
      },
    },
  })
}

All this (resolver, function) is coming from the localization (what an horrible term!?) inventions of great SANITY.IO.

  • .json could be along in the tree folder to create route structure.

(I'm leaving the plurals plus so on questions opened but it's possible to deal with that on the fly)
(same thing for rtl).

  • I have no idea how this could be splitted (namespaces? common vs templated)
    so that language in the navbar, footer plus plus plus are not always re-rendered,
    but it should not be so difficult ...
    last but not least ...

③ implement a simple language switcher.

(knowing that people finally do not really switch from a language to another...)

conclusion wishes for the glory of sapper

In terms of principles, I am asking myself if it's tough enough: I think it is.

This proposal is in the way of sapper: modern, yet simple, sophisticated, terribly efficient!
It is not that complicated either to leave other possibilities to other desires, but all dimensions are there.

I would like someone to help me do that as my technical level is probably limited
(ok let's hard code this here. This is intending to be a smart call to you @andykillen as you are claiming to be interested what a pity you are left so far) (I am intending to answer you please have any anger at me).

I think I am giving lots of hints ok?
All the different pieces of the puzzle are there aren't they!?
Where, who, how, why will be the hands to do what has to be done?!
There are plenty of details that I am not aware of, plenty of little technical differences that count.

I would be very interested to see reactions ...
HEY don't leave me alone as for @andykillen huh!?! ... that is ugly ...

Thank you for giving some of your precious time to read this too long message up until the very end: my expression is a little too much, we're in the wild here, it's time to engage, to get things done ok!?

@tidiview interesting, some initial thoughts.

  • the _helpers directory inside the routes directory structure.
    I do not think this will work
    for example, just using 2 languages, English and Dutch
    routes/en/products/[category]/[slug]
    routes/nl/producten/[category]/[slug]
    There needs to be a subdirectory that is the language of the page and it needs to have language localized paths also. Once down in dynamically created part of the route it might work, but by then there will be no relationship between the tree branches, thus the only language file would be the one for that path.

OK, if you only have 1 route, then it might work as
routes/[lang]/[group]/[category]/[slug]
to give /en/products/cars/ford-fiesta and /nl/producten/autos/ford-fiesta

but then things like /en/about /nl/contact and so on would become way more complex, if not impossible.

there would also need to be some intelligence at the routes/index.svelte to do dynamic routing to the language directories.

  • templating
    for me this should be svelte its self, thus a need to be able to do language translations in .svelte files, and a way to tell them what language is being used.

  • localizing javascript
    For example, replying with currency, time, date is local format, as much as a response that might include text. This could be something like the legends on a form being derived dynamically.

So, I'm thinking that there needs to be global translation files that do most screen text, and then additional translation strings available, all in inside the scr direcotry, i.e. src/i18n/

@andykillen
it's funny that you respond now as I just started to implement what I wrote 10 min ago!!! ☆ NICE ♫ Thank for showing interest to this proposal. Please find some remarks below:

routing is completely ARBITRARY,

but, like a map for a city,
each address has a common reference to its original template thru metadata hreflang OBJECT:

  • keeping track and relation to other locale versions,
  • making the eventual switch between them if neccessary.

(metadata include a language flag) (url prefix also is a language flag).

For the localized name of folders that would be a problem, you're likely to introduce a special logic, based on _helpers: that what I'd like to try.
I would repeat that logic in subdirectories.
Like you, I think it is complicated (and being new to SAPPER therefore walking on glass) that is my challenge from now on.
You also have this:

Regexes in routes

You can use a subset of regular expressions to qualify route parameters, by placing them in parentheses after the parameter name.

For example, src/routes/items/[id([0-9]+)].svelte would only match numeric IDs — /items/123 would match and make the value 123 available in page.params.id, but /items/xyz would not match.

Because of technical limitations, the following characters cannot be used: /, , ?, :, ( and ).

The reasons to try are:

content and rendering are kept well SEPARATED

  • templating is kept simple,
  • localizing currency, time, date is not impossible (it's a little bit more work that's all).
    From a user point of view the reality is that switching from a language to another is exceptional (sorry but this is a developper FANTASY!).
  • content lives in a well structured (like Graph-Relational Object Structured) API (like a headless CMS), has it's own coherence (with its currency, time, date..), so that you can serve it to other channels. and able to be serialized.

What can you ask for more?

(it is true that this is made a little more complicated with SAPPER than with just SVELTE as SVELTE is not as opiniated on routing as SAPPER)
(I think though that if you take time to understand the routing logic of SAPPER, there is no reason that one should not be able to deal with it)

I find this thread endlessly interesting but it also feels like it's going in circles a little bit. I'd like to suggest that maybe Sapper shouldn't/needn't be opinionated in these matters. Perhaps it's something best left to the ecosystem for the most part.

I've been really putting @kaisermann's svelte-i18n through its paces on a number of production projects and honestly it's wonderful. It's at v3, it's stable, lightweight yet comprehensive, very extensible and there still seems to be room for plenty of compile-time enhancements. Sapper doesn't necessarily have to bless this or any other project, but a similar situation could exist as is the case with Svelte and the number of routing solutions that are available. I'd hate to see this issue holding Sapper back on its journey to v1 and likewise it would be good to see some of the energy in this issue directed towards one of the existing solutions.

That said, resolving sveltejs/sapper#1036 (at least for lang and dir attributes on the html element) is crucial to facilitating proper i18n in a framework that supports SSR. I also think optional parameters à la sveltejs/sapper#765 would be a major boon in terms of internationalisation and this is something that would be best handled within the core Sapper project. But neither of these things are necessarily specific to i18n.

I wanted to try my hand at a simple sapper project, and here I am not being able to work out how to set the lang attribute of html in the template.html. Was there no progress on i18n since last year?

I don't have any solutions but I just wanna add that I think i18n needs to be handled by sapper (as far as I understand it) because localized url:s (I know that there seems to be some workaround to "hack" the url:s but that's not very developer friendly and might not be very scalable). I would love to be able to recommend clients (at least those willing to live a bit more risky using an early framework) to go with sapper (because lets face it it is awesome) but most of our clients need localized websites and some wants the URL:s localized. So for now it is hard to recommend sapper for a project. I might be wrong and it could be possible for a package like svelte-i18n to also manipulate the routing.

Yes, localized urls is a must. I have started exploring defining my paths as maps with a path for each language, so I can easily switch from any language to any other. There are many subtleties involved so having something working by default would really put svelte above most others!

Strong agree on the need for localized urls. We have been trying several different ways of getting this to work, including mounting Sapper on different routes in server.js, scoping the routes with a language "folder", etc. None of them work completely.

@Jayphen do you have some more insight in to this after experimenting with different routing hacks for a few different projects?

I’ve only had to deal with the routing aspect of this problem, as in our case all translations are managed by an external CMS.

Mounting the app at different basenames (to which there is a very brief reference to in the docs) works if you have a small amount of languages to support, and the routes aren’t localised. I had some trouble getting it to work on Vercel, but the author of the vercel-sapper package showed how it can be done.

The app I’m working on does not use the above method though; instead it uses a base [lang] directory in routes, as well as setting the lang in the sapper session via middleware from acceptsLanguage. This also seems to work okay, and I’m unsure which is a better approach.

Both of these approaches only work because routes themselves are not localised, and are prefixed. In another project with localised routes we are fetching them from a CMS all ahead of time and creating an in memory manifest. A custom Link component can be used to generate internal links from the manifest. This basically sidesteps the file system routing, and is not trivial. That project is still in its infancy, so i don’t have much more to say on it yet other than the solutions Rich already suggested.

At the very least for now it would be great to have configurable template replacements as options in the sapper middleware, so we can inject a Lang attribute on the HTML element. I think there’s an open PR for this.

I'm completely new to sapper/svelte, but have been disappointed with i18n in many frameworks so if you are willing to fix that better in sapper that would be great!

Do you have a list of features to have already, or is it still being discussed?

First there should be a configurable way to set the locale from different sources. The most logical IMHO for a default would be :
1/ when the URL is localised, use the URL to define the locale (this should take precedence as all translations are not created equal. If the user doesn't know that locale, or has set a different preference in a cookie, then it's better to display a message telling him that page is available in his favorite locale rather than displaying the other page directly).
2/ or, if a cookie is set with the locale information, use that
3/ or, use the accept-language header.
3/ if nothing matches, use the default locale for the route (or global default locale).

I like how svelte handles slugs, and I think something similar should be done for i18n. So, for a given route, we still have the idea of folders and ONE svelte file for all translations, with a special syntax inside the svelte file to specify all the possible values of the URLs with the corresponding locales.

So given this directory structure :

-- |private|
---> |account|
------> settings.svelte

we know that /private/account/settings should hit that svelte file, with the default locale (set site-wide, or sufixed in the file name of the svelte file). Here 'en' but could be any locale.

Inside the svelte file, we define define each URI as an array of (nb folders inside "|" + 1 for the svelte file). E.g. :
l10n = [
{ 'fr', ['privé', 'compte', 'paramètres'] },
{ 'kr', ['사유', '계정', '설정'] },
]

If the URI is /사유/계정/설정 we know the language should be korean, and we could get the localised names from ${i10n.private}, ${i10n.account} and ${i10n.settings} for example.

Then sapper can generate the <link rel="alternate" hreflang=.../>` in the header as well as update the sitemap with this info.

And set the html lang according to the locale, and maybe provide a default way to display the different translations with links as almost every page should display that one way or the other.

If someone is stumbling on this looking for a working and tested solution, here's my code.

Basically what it does is look in your URL for a pattern like /en/watever or just /en it extract the locale slug from that url and matches it against your provided list of locales. If it doesn, it will simply hit the 404 middleware of sapper.

With this code you do not have to use a weird folder structure to make your app work.

To get your locale in your templates, simply get it out of the session param in your preload function.

Obviously you will also have to change your links on the front-end to prefix your links with the current locale, this is up to you.

Simply add this code in your server.js file.

const defaultLocale = "fr";
const locales = ["fr", "en"]

/**
 * Safely extracts a locale out of the url.
 * 
 * @param {string} route - An url path
 */
function localRouteRegexp (route) {
	let localeString = locales.join("|");
	let regexp = new RegExp(`\/(${localeString})(\/|$)`, "gm");
	let currLocale = route.match(regexp);
	if (!currLocale) {
		return defaultLocale;
	}
	currLocale = currLocale[0].replace(/\//g, "");
	return currLocale || defaultLocale;
}

/**
 * Creates the express valid path regex
 * to allow matching the app on different
 * routes.
 * 
 * @param {Array<String>} locales - A list of supported locales
 */
function expressLocaleRouteRegex (locales) {
	let regexp = "(";
	locales.forEach((locale, i) => {
		regexp += `/${locale}`;
		if (i !== locales.length -1) {
			regexp += "|"
		}
	})

	regexp += ")?";
	return regexp;
}

/**
 * A middleware to add the current
 * locale to the svelte session store.
 */
const bindSessionToRequest = (req, res, next) => sapper.middleware({
	session: () => ({locale: req.locale})
})(req, res, next)

/**
 * Finds the current locale in
 * the url path and sets it to the
 * request object.
 */
service.use((req, _, next) => {
	let locale = localRouteRegexp(req.url);
	req.locale = locale;
	next();
})

service.use(
	expressLocaleRouteRegex(locales),
	compression({ threshold: 0 }),
	sirv('static', { dev }),
	bindSessionToRequest
)

I think the only really important thing would be to have an elegant way to implement localized routes. For SPA's with logged in users a solution with cookies is sufficient. Best would be if you could mix both approaches. The rest can be left to plugins.
For localized urls my favorite would be:

domain.tld/page-in-english.html
domain.tld/fr-FR/page-en-francais.html
domain.tld/de/seite-auf-deutsch.html
…

@sudomaxime Hey, thanks for your solution but is it suppose to work only with Express.js? Because I'm trying the same code with polka and it always returns 404 for every page. And did you test and noticed anything about the performance when using this code?

It is not mentioned by anyone here yet, Next.JS has i18n routing built-in now https://nextjs.org/docs/advanced-features/i18n-routing

Also, anyone mentioned the <link> tag in <head> yet? It would be great if each page has a rel="alternate" to all other versions automatically (all pages and all languages has bi-directional link to each other). I had to do something like this in my own site :

<svelte:head>
    <link rel="alternate" hreflang="x-default" href="REAL_HOST/{neutralPath}" />
    {#each supportedLanguages as language}
        <link rel="alternate" hreflang="{language}" href="REAL_HOST/{language}/{neutralPath}" />
    {/each}
</svelte:head>

(The "alternate" includes the language you are on too, but it seems that is OK. More info.)

@5argon I heard said from a couple of international SEO specialists that putting the hreflang stuff into a sitemap.xml is better than putting it in HEAD meta. Thought you might be interested.

It is not mentioned by anyone here yet, Next.JS has i18n routing built-in now https://nextjs.org/docs/advanced-features/i18n-routing

I haven't used nextjs's i18n routing, but everything described in the docs can be achieved with Sapper today in userland, except the automatic lang attribute on the html element (for which there is an unmerged PR here sveltejs/sapper#1695)

xpuu commented

I spent last 23y with webdevelopment. Last week a childhood friend asked me to make him a simple static website. Piece of cake I thought. Mankind is getting ready to colonise Mars. We have tools to make simple websites.

I chose Vercel as my target platform. I wasn't sure about Sapper deployment so I tried Nuxt first. Using Nuxt-i18n was a bliss. It all went great until I peeked in exported source code. The amount of bloat overwhelmed me.

I switched to Sapper, but now I realise it's impossible to have sanely localised URLs. I already spent few days solving it. I tried:

  • Polka middleware

    Which almost worked but didn't, because Sapper client router doesn't know anything about that.

  • Obscure regexes in filenames

    [lang]/[t(the-team|das-team)].svelte

    Mentioned by similarly desperate Vincenzo Lombino on StackOverflow. But I don't want a prefix for a default language, because it will cause extra redirect.

Lessons learned from this

  • Convention but configuration

    Nuxt is generating routes from the component filenames in a similar manner as Sapper. But you always have an option to extend or even completely bypass this mechanism. I admire programmers willing to let others to take control if necessary.

  • Routes generation strategy

    In Nuxt-i18n they have

    • 'no_prefix': routes won't have a locale prefix
    • 'prefix_except_default': locale prefix added for every locale except default (recommended)
    • 'prefix': locale prefix added for every locale

    They all seems valid to me.

  • Ability to exclude route from i18n

    In Nuxt-i18n you can exclude a route from some locales or from i18n completely. How very important!

  • Language detection flexibility

    As @bernardoadc mentions, it should be less opinionated. Bootstrap function which returns locale code is a great idea.

  • Option to change <html {lang}> is a must

    Whatever can increase your SEO is very important.

^^^ Backing up the comment above, really hope that SvelteKit will allow having i18n URLs structured like we need and want, now I stopped developing my app just waiting for this.