rladies/praise

Praise in other languages

gaborcsardi opened this issue Β· 37 comments

Specifically Chinese first, via @Avatoo. \o/

We need to work out some simple architecture first.

I can help for praise in French πŸ˜„

@masalmon Cool, I'll soon update the code to handle multiple languages.

Merveilleux ! Fantastique ! Superbe !

I am thinking about a good way to do this. The goal would be to be able to write

praise(gettext("Your tests are ${adjective}!"))

or something like this, and then get praise in multiple languages. Two things are required for this:

  1. We need to add some translations to testthat or whatever package we want to add international praise to. This would use the usual NLS system.
  2. We need to add the parts of speech in other languages. E.g. adjectif for French, etc.

E.g. the gettext translates the string above to "Vos tests sont ${adjectif}", and then we just use this template as we are using it now.

Does this make sense? Or do we want to try automatic translation via the google translate API? I guess that could be error prone, so maybe the NLS way is better?

Would it be a lot of work to test the google translate API on a few examples to see how bad the results are?

The thing is, even is we can use google translate, we also want a way that lets people have more control. So why not start with that?

Hi @gaborcsardi following up on this -- a bit late sorry. What exactly could I do to help make praise work for French too (apart from contributing words)?

@chucheria would like to contribute for Spanish.

@masalmon Thanks!

I guess we would need to decide what "work" means. I.e. consider testthat praise. It is implemented like this:

praise::praise("Your tests are ${adjective}!")
praise::praise("${EXCLAMATION} - ${adjective} code.")

So how would (hypothetical) user Hadley add support for other languages? Or all this would be automatic? The two obvious solutions are:

  1. We translate all non-template words via Google translate or sg. similar in praise, if we detect a French locale. And then just substitute in the templated words, i.e. the nice adjectives to get a sentence in French.
  2. We require user Hadley to supply templates in various languages. We might help user Hadley with translation tips via an automatic translation service.

The first solution is nice if it works well, and maybe it works well for simple templates. Maybe we can implement both solutions.

What do you think?

I guess the first solution is easier? Or in the case of a package like testthat, I could translate all the templates, because there are not many anyway?

Also, the idea would be to have people contribute the nice adjectives (because then you only need to know your language and a few git commands), but I guess that part is easy.

I guess the first solution is easier? Or in the case of a package like testthat, I could translate all the templates, because there are not many anyway?

Anyway, maybe we can implement both? Let's implement the automatic way, and see how it works. Btw. Google translate is not free any more, but maybe this works: http://www.r-pkg.org/pkg/RYandexTranslate

Also, the idea would be to have people contribute the nice adjectives (because then you only need to know your language and a few git commands), but I guess that part is easy.

Agreed.

I've just installed RYandexTranslate & registered for the free service (at last!). They seem to use two-letters language code.

I've also looked at your commit regarding language detection, is there a particular reason you use Sys.getlocale() instead of Sys.getlocale(category = "LC_COLLATE")?

Last very small things for today, I looked at praise code in testthat and the praising and encouraging sentences are "hard-coded". Should praise have categories for this (a "english_congratulation.R" and "english_encouragement.R"), and can we hope to have them replaced in testthat?

The Yandex API works well for the unique sentence to be translated in testthat:

> translate(api_key, text = "Your tests are", lang = "en-fr")
$lang
[1] "en-fr"

$text
[1] "Vos tests sont"

I've just realized that in languages like French ${adjective} will need to be ${singular-adjective} and ${plural-adjective}.

I've also looked at your commit regarding language detection, is there a particular reason you use Sys.getlocale() instead of Sys.getlocale(category = "LC_COLLATE")?

DOn't remember. Looks like this is what I am doing: 0ca9979#diff-951791f1fb37d9e5b0f0cf852ce38d83R30

I suppose we can add LC_COLLATE here as well, I don't really see why you would have that set up and the others not, but I don't know much about locales.

. Should praise have categories for this (a "english_congratulation.R" and "english_encouragement.R"), and can we hope to have them replaced in testthat?

Maybe, but in general I would leave writing sentences up to package authors depending on praise.

I've just realized that in languages like French ${adjective} will need to be ${singular-adjective} and ${plural-adjective}.

Hmmm, yeah, that's a problem, and more "complicated" languages will be even worse.

So I would keep it simple and use the auto-translation for suggestions only. Maybe the manual praise translation is even better, then people speaking various languages can just contribute translations to testthat and other praising packages. How about this?

  • On my PC (Windpws)
> Sys.getlocale()
[1] "LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETARY=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252"
> Sys.getlocale("LC_COLLATE")
[1] "Spanish_Spain.1252"

so the substr wouldn't work with Sys.getlocale()?

  • I don't understand what you mean by manual praise translation? How would this work for the praising packages?

so the substr wouldn't work with Sys.getlocale()?

OK, we'll need to read more about locales I suppose. Or find good code that gives a two or three letter code from the locales.

I don't understand what you mean by manual praise translation? How would this work for the praising packages?

Package author writes the sentences in all languages she knows. (She can get help from auto-translation, but I would implement auto-translation later.) Then people that know other languages can submit pull requests that add support for other languages that praise supports. I think this is good, because it encourages collaboration.

  • Ok, I'll try to make myself wiser about locales in the next weeks.

  • And could some examples be kept in the praise package itself if they are general sentences?

And could some examples be kept in the praise package itself if they are general sentences?

Sure, that makes a lot of sense. We can have a praise_code() function or praise_package() or some generic function, e.g. praise_this("package").

What would the praise_this("package") function do? Create the infrastructure for recognizing language?

Oh, no, sorry, these would be just praising sentences that are kept within praise, and they could be translated to all languages we support.

There's a R package for plurals but only in English, what a pity: https://github.com/hrbrmstr/pluralize

@masalmon No prob, if we go the "manual" way, we don't really need that.

Btw. I think hunspell can do this for all languages that it supports, but we don't need to worry about it now.

Just a summary of the discussion (cc @chucheria ) @gaborcsardi please correct me if I'm wrong which I quite likely am :-)

  • The international branch of this package has e.g. english-adverbs.R, for each new language we have to add all the corresponding .R. @chucheria & I could create these files and they'd be filled during git workshop, even if the rest of the international structure of the package isn't ready, because these collections of words will still get useful at some point.

  • The code for recognizing the locale needs to be improved a bit. Note, we'll have to write the correspondance between a 2/3-letter language code and the full name of the language.

  • The code for recognizing the locale will be used in generic functions inside praise.

  • However, the international possibilities will be useful only if

  1. maintainers of packages using praise, e.g. like testthat do, accept to see their R code modified so that it includes recognition of the locale,
  2. volunteers submit translations of sentences of the package to the package maintainer,
  3. so that if the locale is a language other than English that is offered by praise + the package itself (you need the adverbs in Spanish in praise and the sentences in Spanish in testthat for instance), the package can output messages in this language.

I would not put locale stuff in testthat & co, I would just do sg. like

praise_lang("You are ${adjective}!", lang = "en")
praise_lang("Du bist ${adjective}!", lang = "de")

or sg like this.

Or even just

praise("You are ${adjective}!", lang = "en")
praise("Du bist ${adjective}!", lang = "de")

or

praise(
  en = "You are ${adjective}!",
  de = "Du bist ${adjective}!"
)

Another way would be to use gettext...

What is gettext?

The standard way to translate text messages. See ?gettext.

So with gettext, people could just write

praise("You are ${adjective}!")

as before, but then praise() would check if the "You are ${adjective}!" string has a translation in the current locale, either

  1. in the calling package, or
  2. in praise itself.
    After the translation, we would do the templating, as before, using the detected language.

Then the messages would need to be translated using e.g. msgtools. But the words lists would be the same as before.

This sounds like the easiest solution?

For the users, yes. Even for people adding new words.

For people dealing with the translation system (=us), not really. :)

But then we can praise ourselves ;-)

OK, I implemented a framework: https://github.com/rladies/praise/tree/international

I'll write a short guide on how to add translations, and then we can test it on you if you don't mind. :)

Btw. we'll need to re-organize the package a bit, because non-ASCII characters are not allowed in code. So I'll move the words to data/ or inst/.

Awesome! Looking forward to testing it.

GΓ©nial ! J'ai hΓ’te de le tester !

Here is a short how-to: https://github.com/rladies/praise/blob/international/inst/international.md

I have added Hungarian, not too many words, just s PoC.

FYI.

I'll have a better look next week but this looks AWESOME! πŸ‘πŸ‘πŸ‘

For other languages it's important to make the difference for genre, its not the same expressions for men than for women, can change the written and also the meaning from very good to very bad ^_^