flow-php/etl

General understanding

azngeek opened this issue ยท 4 comments

I want to evaluate the following case:

  1. Read data from one system and enrich it with data from another system. One system returns a json while the other one returns some strings.
  2. The data requires heavy transformation. Basically applying some algorithms is required where the result is a new json payload for an external systen.
  3. Afterwards the data needs to be grouped in bulks of for example 100 items and then published to another system.

I currently struggle with the transformation-part. I would build a class which runs all the transformations but do not understand, how this could be achieved in this library:

$flow ->read($externalSystem1) ->transform() // How would i pass a custom transformer? It would be a class. ->write(To::memory($array)) ->run();

Hey, excellent question, in general, all transformations are registered at DataFrame which is returned by Flow.

Let me show you some examples of how you can encapsulate the transformation logic in a single class.

final class Transformator
{
    public function __invoke(DataFrame $df) : DataFrame
    {
         return $df->rows(Transform::xx())
             ->rows(Transform::xx())
             ->rows(Transform::xx());
    }
}

(new Transformer())((Flow())->read(...))  
  ->write(...)
  ->run(); 
  

Another way is to actually create your own transformer:

final class MyTransformer implements Transformer
{
    public function transform(Rows $rows) : Rows
    {
         return $rows;
    }
}

(Flow())
  ->read(...)
  ->transform(new MyTransformer())
  ->write(...)
  ->run(); 

But it seems that it might be also a good idea to introduce another interface that would receive and return DataFrame
and return another DataFrame. Let me think more about it.

It was actually pretty straightforward #242
Looking forward to hearing from you if that solution would be sufficient for your needs.

This helps me. The interface in https://github.com/flow-php/etl/pull/242/files#diff-fe55759eeceda3c87919d0334841324d146353a1a5c38b384411b61b3ce1c901 seems to be intuitive enough for me and also most other people i guess.

Great work by the way ๐Ÿ‘ I was looking for an alternative for our heavy node.js stack which could also be used in PHP-projects. Much appreciated.

#242 was just merged ๐Ÿ™Œ happy transforming!