How to use faker’s localized providers?
vincent-hatakeyama opened this issue · 3 comments
Hi,
Is it possible to use faker’s localized providers?
I need to use this one for example: https://faker.readthedocs.io/en/master/locales/fr_FR.html#faker.providers.ssn.fr_FR.Provider
Regards
Hi,
with the current version of pganonymize
the Faker library will always be initialized with the default locale en_US
.
To be able to use localized providers the locale should be added as an optional argument within the YAML schema definition or as an additional property for the FakeProvider
. This is currently not supported but it is a great idea to get access to the localized providers as we would also possibly use localized data like VAT-IDs or states.
I will take a look into that, thank you for reporting / requesting this feature.
Regards,
Henning
I suppose the main difficulty for the implementation is a performance issue: if we pass the locale on a table's field level within the YAML schema and instantiate the Faker
class for each table record (instead of module wide), this would result in a poor execution time, e.g.:
import timeit
>>> timeit.timeit('faker.first_name()', setup="import faker; faker = faker.Faker()", number=1000)
<<< 0.3215181827545166
>>> timeit.timeit('faker.Faker().first_name()', setup="import faker", number=1000)
<<< 14.740003108978271
So I guess the only way to prevent the initialization on record level is to provide something like global provider options within the YAML schema that will be passed to a single and reusable Faker
instance, that will be used for all records, like this:
tables:
- address:
fields:
- first_name:
provider:
name: fake.first_name
- last_name:
provider:
name: fake.last_name
- vat_id:
provider:
name: fake.ssn
options:
faker:
locales:
- de_DE
- fr_FR
Faker's multi localization mode could be also used to provide more than one locale, but this would also mean that common generator methods like first_name
or last_name
will result in random names (according to the locale order).
The localization feature will be part of the upcoming release 0.10.0 - thanks to @BuddhaOhneHals for the contribution.