vesta-webtrees-2-custom-modules/vesta_extended_relationships

Language-specific relationship names

Closed this issue · 16 comments

As there is no progress in webtrees for this issue, I'll implement this here first. Hopefully we can move the functionality to webtrees eventually.

Main goals:

  1. Define relationship names per language in a concise way, without having to translate every special term in every other language
  2. Improve the algorithm for determining the relationship name in complex cases (or make the algorithm configurable)
  3. Support genitive constructions in languages that benefit from it ("Ehemann der Tante 2. Grades" instead of ungrammatical "Tante 2. Grades' Ehemann")
  4. Allow definitions of arbitrary complexity that may be required for specific languages, or even in general. See the forum for a suggestion regarding step-relationships. This should be possible in the new approach.

Out of scope:

The acual implementation will be freely customizable in each language. Additionally, support common cases:

Re 1.: Provide a DSL so that (hopefully) non-developers can provide the language-specific logic in a descriptive manner rather than via regexes and switch cases. Example for German (second term is for genitive constructions):

$defs []= RelPath::any()->parent()->father()->is('Großvater', 'des Großvaters');
$defs []= RelPath::any()->parent(Times::fixed(2))->father()->is('Urgroßvater', 'des Urgroßvaters');
$defs []= RelPath::any()->parent(Times::min(2))->parent()->father()->is('%s×Ur-Großvater', 'des %s×Ur-Großvaters');

Re 2.:
The current algorithm uses over all string length as the criteria. Other options are discussed here. A promising approach seems to be to handle common-ancestor based sub-paths as units. These kinds of sub-paths are supported in many languages as specific terms, even in complex cases ("third cousin twice removed ascending"). Alternating those with spouse-base sub-paths seems to produce names that are more easily understandable in general (see here for an example).

About language specific files -- why do not create dummy (unused in code) php file with "static" strings, which can be translated by usual way (gettext)?

Because different languages require different strings anyway. In German, we only need

$defs []= RelPath::any()->spouse()->father()->is('Schwiegervater', 'des Schwiegervaters');`

while in Slovak we need to distinguish

$defs []= RelPath::any()->husband()->father()->is('svokor', 'svokra');
$defs []= RelPath::any()->wife()->father()->is('tesť', 'tesťa');

So we need language-specific files anyway, which have to be created by developers who are familiar with the respective language. I don't see much benefit in adding another layer.

(The current approach in webtrees is different, but unsustainable in general because it requires all languages to translate a large number of 'exotic' strings, many of which are only relevant for a few languages)

ro-la commented

I will try to make a Slovak file with the actual translations used in webtrees.

Ladislav

Ladislav, that would be very welcome for testing! Once we have that file, I'll create a proper release.

Let me know if you need help with the more complex cases i.e. stuff like

$defs []= RelPath::any()->parent(Times::min(5, -3))->parent($ref)->sibling()->child($ref)->daughter()->is('%1$s×Ur-Großtante %2$s. Grades', 'der %1$s×Ur-Großtante %2$s. Grades');

OK, in provided example i see common pattern, thus we can ignore language specific names and and stay on English placeholders:

RelPath::any()->spouse(common|male|female)->father()->is(male, female)

This will produce three English strings:

  • spouse's father/mother
  • husband's father/mother
  • wife's father/mother

Thus in code is enough to split languages into two groups -- one with common name, and one with gender depended name and use dummy file for strings.

By this, you (as code maintainer) can simply add English strings and leave translation to translators.

This approach is already used in webtrees in more strings...

Thus in code is enough to split languages into two groups -- one with common name, and one with gender depended name and use dummy file for strings.

But this is just a simple example. The differences between specific languages are far greater. See e.g. all the switches for different cousin numbering systems here. That's exactly what we want to move away from!

This approach is already used in webtrees in more strings...

And it is not a good solution in this case, as described e.g. here by Greg:

The current system requires a translation into every language for every relationship that exists in any language. If one language has a special rule, then every language must also translate it.

We now have over 1000 translations for relationship names. This is not sustainable.

I agree with him that we need language-specific code for this.

ro-la commented

Sometimes its time to ask about the very basic translations :-)
husband = Ehemann = manžel
wife = Ehefrau = manželka

spouse = Ehepartner = ???
mostly it is translated in Slovak as "partner" but it is (IMO) not correct. Is a "spouse" somebody who has married, but we dont know the gender? Than it should be in Slovak "manžel/manželka". Any other idea @slavkoja?

Partner = Partner/Partnerin = partner/partnerka should be used for "families" without mariage. Or am I wrong?
How should be the correct $defs for this "partner" - we need a masculine and a feminim form.

In the current path generation logic, 'spouse' occurs in the path only in case of unknown gender (i.e. in practice almost never in this context?). In the DSL, '->spouse()' can be used if the actual gender is irrelevant.

The marriage status currently isn't addressed (see Greg's comment in the main issue), except for relationships of path-length one. I'm currently using the old implementation for these short paths anyway, so it should be ok even to skip those. Or do you need the marriage status for the more complex definitions (uncle, cousin etc.) as well?

Ultimately we'll probably have to add

->exHusband()
->nonMarriedMalePartner()
->adoptedChild()

and so on for languages that want to cover all those cases as well.

Partner = Partner/Partnerin = partner/partnerka should be used for "families" without mariage. Or am I wrong?
How should be the correct $defs for this "partner" - we need a masculine and a feminim form.

We have look into it in depth. Is needed in genealogy to differentiate between married and not married partners? IMO not, they already constitute pair -- nowadays more and more children comes out of married parents. We even need to not distinguish gender in husband/wife terms -- consider that both can have same gender nowadays and use recommended way (a read it somewhere on GEDCOM related site, no link) and consider both tags (HUSB/WIFE) as backward compatible, not as gender related.

IMO, we need to apply the KISS pattern/principle. As we never will be able to construct proper translation of all combinations in the tree and in all languages, we will need to define some boundary of translatable relations and for all other combinations to use "Terrible to describe, see chart".

IMO, we need to apply the KISS pattern/principle.

I think this is a sensible approach. I hope these complex terms (step-x, adopted y, ex-z) are only relevant for simple one-step relationships anyway (nobody refers to a 'step-cousin once removed ascending' and so on).

At the same time, I think it still make sense to have them for the simple relationships (just like in the original implementation), otherwise we'd have to use 'partner' everywhere even in case of married couples.

So for now just assume that all one-step relationships are covered elsewhere and don't have to be defined in LanguageXxxExt.php.

Edit: One-step relationships are now supported (extending the original possibilities e.g. 'adoptiveParent').

Initial version is now available in release 2.0.15.1.0. Supported languages: English, German, Slovak.

Great! Well done! I have done several tests, all are perfect. Only one minor remark: in German I would prefer Ur-Ur-nnn instead of 2xUr-nnn. If there are four or more "Ur" then 4xUr-nnn is ok.

Regarding the "2xUr-" cases, I agree. The resulting term "Ur-Ur-" is only minimally longer anyway. I'll adjust this. But I think I'll keep the "3xUr-" cases.

Dutch added and German terms adjusted in 2.0.15.2.0.

Work on this has finally continued in webtrees itself. Unfortunately there is no chance of getting the solution implemented here integrated into the webtrees core. I'll keep it for now, but it likely won't be developed further.