datafaker-net/datafaker

How to specify country-specific setting, not language-specific?

asolntsev opened this issue · 2 comments

I see a design issue in current way of DataFaker configuration.

All settings are stored in files

  • src/main/resources/LANG.yml
  • src/main/resources/LANG-COUNTRY.yml

It's reasonable for language-specific data (names, city names etc.),
but there are settings that do NOT depend on language at all, but solely on the country. For example:

  • phone_number
  • cell_phone
  • passport
  • id_number

To generate Moldovan ID number, user needs to pass Locale parameter where ro is in the first place:

  Faker faker = new Faker(new Locale("ro", "MD"));
  faker.idNumber().valid();
  • ro is Romanian language used in both Romania and Moldova.

Also, there are also en and ru languages in Moldova widely used.

The question is:

  • should the Moldovan phone_number and id_number configuration be located in file ro-MD.yml, ru-MD.yml, en-MD.yml?
    • duplicated in all these files?
  • Or maybe in some country-specific file like src/main/resources/countries/MD.yml?

@snuyanzin @bodiam thoughts?

Thanks for starting this discussion
Yep, seems we have not faced this issue so far, at least nobody has mentioned it before...
However I admit that such issue exists and we inherited it from from JavaFaker https://github.com/DiUS/java-faker

One of the solution is the one you've mentioned like

configuration be located in file ro-MD.yml, ru-MD.yml, en-MD.yml?
duplicated in all these files?

One thing which could be improved with this solution is extraction of common logic for e.g. phone_generation into a separate class like

class MyPhoneClass  extends AbstractProvider<BaseProviders> {
...
  public String myCountryPhone() {
     String[] patterns = new String[] {"123-###-###-##-##", "234-###-###", "345-##-#"};
    return faker.numerify(patterns[faker.number().number(patterns.length)]);
  }
...
}

After that we could put a reference to that method in config like

    phone_number:
      formats: #{MyPhoneClass.myCountryPhone}

So in this case we still need to put the reference in all configs however the implementation will in one place

Other ways seems to require more drastic changes... at least I don't see some other simple ways...

UPD:
Another (probably more time consuming and challenging) is try to adapt similar behavior as for en.
Currently there is a en folder which is considered as default one for everyone and then more specific per lang+country locales.
Ideally for backward compatibility we still need to keep en as default one if no key is found in any other corresponding configs. At the same time we could also have something like default per language in dedicated folders, like folder for ro with ro specific data. Then first key will be looked in config for current locale, then in lang default locale and then in en locale.
I'm pretty sure this will not work out of the box, so some research is required in that direction

not sure it will work...

@asolntsev it seems I was able to add support for it at #1219
with ro-MD, ru-MD, en-MD as examples
please have a look