fzaninotto/Faker

Faker 2.0 design

pimjansen opened this issue ยท 26 comments

Hey all,

Today we discussed that 1.9.0 will be our last minor release version of the 1.x branch. After this no new enhancements will be done and we will start focusing on Faker 2.0. To kick off lets have an open discussion with all of you and share our ideas.

  • Which PHP versions will be supported
  • Which development tooling is going to be used
  • How does the actual architecture looks like

Best,
Pim

My cents:

PHP version
I would really like to kick off fresh by starting with 7.4. I know it will not be backwards compatible but that is the same with 1.x vs 2.x. The main reason for this is typed properties which makes an application way more strict. Faker 1.x is around for so long and is still working fine. The upgrade from 7.2 > 7.4 is an easy one. Also the active support for 7.2 is already ending jan-2020.

Tooling
There is a lot of great tooling available to ensure packages are working fine.

  • PHPStan
  • PHPCS
  • PHPMD (yes it is maintained again)
  • PHPUnit

Architecture
One of the main problems of Faker today is the fact that there are soo many locales. As you might know none of us knows everything which means there are a lot of PRs in locales where we have no idea what it is about. A second problem is the licensing of the content. A lot of this is even unreadable for us so also the content can not really be verified by any of us.

My suggestion would be to split Faker in a Core library which holds the basics. Actually everything that is not locale specific

  • Numbering
  • Date calculation
  • etc

For all the other things the Core will provide some interfaces where needed so that we can ensure that a locale can be connected with the core properly. The same idea here goes for all the providers. They should also be pluggable just like a locale.

The downside actually is that there will be a lot of different libs that should be loaded, however most of the time you use 1 or maybe 2 and that is it. At this point we are always shipping all of them which is totally unneeded.

Myself i will probably provide a nl_NL locale since it is easy for me to read and handle but this will not be coupled with the Faker core itself.

My first cents, probably much more to come though. So let us know

Really support the idea of making a core and pluggable locales. An advantage of this architecture will be the possibility to inject custom locale versions and currently unavailable locales. I am happy to provide dv_MV.

@localheinz i was thinking about how we can handle the locales and seperate packages where the faker core still holds its value. For example if i have a carrier which can hold "Vodafone", "T-Mobile", "AT&T" and so on. This is typically something that a locale could hold on its own since it will be different for each of the implementations.

However how are we handling that from the core? Are we going to implement the interface that should be implemented for those? This will keep real mean to the core however there is always a lot of locale specific. Like a random VAT charge and so on, different identification types. I don't think the core should and can hold all of that (brings us in the same troubles).

From the other hand, just leaving all of that up to the locale is also not great. This means that there is no real value of the core itself except maybe make it easy for you to load different locales together. But if you are not doing that, why not just load the locale directly in that case? Imo this is not something we should want since it will split it off way too much.

What do you think?

I think one way to go about that a core class is created as single point of entry with methods providing non locale-specific properties e.g Date, Random number, text, etc. Other locale classes extend same class and define their methods. This will mean each locale needs to have specific readme to list available methods and properties. Something like this:

//base interface
namespace Faker;
interface FakerContract {
//core method that shouldn't be overridden
public static final function random() : int

//core method that can be overridden
public static function phone() : string
}

//base class
namespace Faker;
class FakerConcrete {
//core method that shouldn't be overridden
public static final function random() : int {}

//core method that can be overridden
public static function phone() : string {}
}

namespace Faker\en_NG;

class Phone extends FakerConcrete {
//override parent method
public static function phone() {}

//locale specific
public static state() {}
}

I'm OK with these ideas. Another thing to work on is the ability for locale-specific Fakers to use a different charset for the Text providers. Users of non-latin alphabets like Japanese or Arabic currently can't use anything else than RealText, which is low as hell and probably not fit for generating random words.

@fzaninotto agree! One thing i however did not mention yet is the ORM integration there is. Are we going to keep that or pull that and maybe publish it as a standalone provider?

I think they should be moved out and managed by the ORM developers. We can't know and master every ORM out there!

i can help with laravel ORM

i can help with laravel ORM

Good to hear @ManojKiranA. Once we have a first beta of the Faker core i think its time to think about how to implement providers that can hook into ORM there.

how to implement providers that can hook into ORM there.

Waiting for it ๐Ÿ˜Ž

Biggest problem I currently have with this library is that the random method/seed is global/singleton (mt_srand mt_rand).

It would be amazing if each Faker instance can have its own Seed. Or at least faker should base on 1 randomBytes() or randomInt() method which is easily overridable so an alternative generator (with an instance seed) can be used.

psuedo code below:

$f1 = new Faker();
$f2 = new Faker();
$f2->seed(1235);
$f1->seed(1234);
$f2->number(); // this will be first number from seed 1234 instead of 1235

@joelharkes not sure if this is 100% the case but if so i agree. The seed should be isolated for each instance on its own indeed

Random thoughts, feel free to comment

namespace Faker\English;

// general purpose, locale-specific factory
class Factory extends Faker\Core\Factory {
    protected static $defaultProviders = [
        'address' => Faker\English\Address::class,
        'barcode' => Faker\Core\Barcode::class,
        'color' => Faker\English\Color::class,
        'datetime' => Faker\English\DateTime::class,
        'image' => Faker\Core\Image::class,
        'internet' => Faker\English\Internet::class,
        'lorem' => Faker\Latin\Lorem::class,
        'misc' => Faker\English\Miscellaneous::class,
        'payment' => Faker\English\Payment::class,
        'person' => Faker\English\Person::class,
        'phone' => Faker\English\PhoneNumber::class,
        'text' => Faker\English\Text::class,
        'uuid' => Faker\English\Uuid::class,
    ];

    // the sttic create() method comes from the parent
}

// specialized purpose, locale-specific factory
class EcommerceFactory extends Faker\Core\Factory {
    protected static $defaultProviders = [
        'address' => Faker\English\Address::class,
        'barcode' => Faker\Core\Barcode::class,
        'color' => Faker\English\Color::class,
        'datetime' => Faker\English\DateTime::class,
        'image' => Faker\Core\Image::class,
        'lorem' => Faker\Latin\Lorem::class,
        'payment' => Faker\English\Payment::class,
        'person' => Faker\English\Person::class,
        'phone' => Faker\English\PhoneNumber::class,
        'text' => Faker\English\Text::class,
    ];
}

// allowing users to oferride the providers of a particular factory
class MyEcommerceFactory extends Faker\English\EcommerceFactory {
    // is it possible in PHP?
    protected static $defaultProviders = [
        ...Faker\English\EcommerceFactory::$defaultProviders,
        'image' => My\Image::class
    ];

    // if we cannot do it, let's just do
    public static function create() {
        return Faker\English\EcommerceFactory::create([
            'image' => My\Image::class,
        ])
    }
}

// usage: use localize factory directly
$faker = new Faker\English\Factory::create();

// multi-language support
$englishFaker = new Faker\English\CRMFactory::create();
$frenchFaker = new Faker\French\CRMFactory::create();

$multiFaker = new \Faker\Core\LanguageAggregate($englishFaker, $frnechFaker)
echo $multiFaker->lastName(); // chooses either one of the locales

There should be a distinction between the "country" and "language". For example: Belgium might format addresses differently than The Netherlands but they (might) speak the same "language".

@JoshuaLuckers The concept of "locales" solves that problem; you'd have nl_BE and nl_NL, where the language for both is Dutch (nl), but the location is different (NL/BE). This way you can localize everything like formatting addresses/phone numbers etc, while still maintaining the same spoken language.

So instead of using the language names in the namespaces (like \Faker\English\Text::class), I propose we use the locale: \Faker\NL\BE\Text::class and \Faker\NL\NL\Text::class. In addition to the classes provided by core, of course: \Faker\Core\Text::class. Typing it out though, having a namespace like \Faker\NL\NL\... might make the API a bit confusing to use, so we'd need the docs to be crystal clear on this.

Thoughts?

@svenluijten it may not rely on a crystal clear doc with explicite namespace like :
\Faker\Language\NL\Location\BE\Text::class
\Faker\Language\NL\Location\NL\Text::class could be simplified with \Faker\Language\NL\Text::class

๐Ÿคทโ€โ™‚๏ธ

stof commented

another option is to keep the locale itself as a single segment of the namespace: \Faker\nl_NL\Test::class

What about the idea of linked data? Eg. generate 1 set of data to be able to get related data fields. Similar as @joelharkes mentioned, but little more extended:

$faker->fixed(true);
echo $faker->name;
    // Adaline Reichel
echo $faker->firstName;
    // Adaline
echo $faker->lastName;
    // Reichel
echo $faker->safeEmail;
    // adaline.reichel@example.org

// ...

echo $faker->name; 
    // Adaline Reichel
$faker->next();
echo $faker->name;
    // Roscoe Johns

I like have factories in faker as factory-muffins do, would be great for tests

I do like the idea of linked data, in fact I came here to suggest the same.
Though it might require a bit more complex sourcing.
My example would involve address that will link city, country, post code and GPS coordinates.

When it comes to instance of faker per locale, I don't think that is a good idea.
It would quickly turn into a big mess of dependency injection.
Also it would be nice to have an easy way to use multiple locales.

Instead I would suggest to have a main Faker/Faker class and then multiple Faker/Provider classes that will describe themselves (using a getLoacle method or something similar).
Then you could fetch data using something similar to current modifiers.
Eg. $faker->locale('en_US|ru_RU|es_ES')->firstName(); to get a name in one of 3 languages.

This could be even expanded to create some form of FakerContext object that will hold details about what we want now.

$faker = Faker/Factory::create();
$seededFaker = $faker->seed(123);
$englishFaker = $seededFaker->locale('en_US');
$person = $englishFaker->person();
$fullname = $person->name; // Adaline Reichel
$firstName = $person->firstName; // Adaline
$lastName - $person->lastName; // Reichel

In this example $faker could be an instance of faker itself,
$seededFaker would be a context with seed stored,
$englishFaker would be a new Context (clone) with seed and locale set.
$person would be a special context with data already picked.

@hubertnnn 's idea looks promising to me

@hubertnnn i was thinking more like this:

$faker = new Faker();
$faker->addLocale(new MyEnglishLocale);
$faker->addLocale(new MyFrenchLocale);
$faker->addLocale(new MyGermanLocale);
$faker->addProvider(new CustomTelcoEcommerceProvider);

$faker->getFirstname(); // Outputs a random firstname from the list of given locales

The only concern im having that this would work fine for a given interface spec where the locale implements a certain interface (and ofc the core methods like digits/date and so on). However in some cases there are also methods specific on a locale. For example different kind of person registration types and so on.

@pimjansen
Your solution is not that far from mine. I did not show the providers part cose I am not sure how the providers should be loaded in the standalone version (aka. in Faker\Factory::create()), but I was thinking that when used in a framework, the service provider would generate something like this:

function provideFaker() 
{
    $faker = new Faker();
    $faker->addProvider(new MyEnglishLocale);
    $faker->addProvider(new MyFrenchLocale);
    $faker->addProvider(new MyGermanLocale);
    $faker->addProvider(new CustomTelcoEcommerceProvider);
    return $faker;
}

then you could use the provided faker directly or use a context to narrow the operations:

$faker->getFirstName();
$faker->seed(100)->getFirstName();
$faker->locale('en_US')->getFirstName();
$faker->person()->getFirstName();

All the above would be a valid way to call getFirstName.

So we would have 2 points to set locale:

  1. During creation of faker we will add providers for supported locales.
  2. During generation of data we would be able to filter providers used.

@hubertnnn yeah that could indeed be an idea ofcourse. We should aim for an as easy as possible usecase. Im pretty sure that the way it works now won't be backwards compatible. So it is very import to do it correct for the users right away

@pimjansen May I ask about the status of the project so far?! Do you need help on the new design development?!

The Guesser should be usable from outside and should have a method to get the guessed name instead of an anonymous Closure.
I'm working on a code-generator for tests, that should auto-populate with Faker calls.

Also: The Guesser could have an option for localization.
We could have a registry for each available Faker and the Faker could have an own Guesser method, that receives the name and additional params and returns true if it matched.