flow-php/etl

RFC | Replace "in-house" cache system with PSR-6

stloyd opened this issue · 3 comments

In the PHP community, there is PSR-6 which defines common interfaces for the common use of cache implementation.

Its key concepts can be described as:

Item
A single unit of information is stored as a key/value pair, where the key is the unique identifier of the information and the value is its contents;

Pool
A logical repository of cache items. All cache operations (saving items, looking for items, etc.) are performed through the pool. Applications can define as many pools as needed.

Adapter
It implements the actual caching mechanism to store the information in the filesystem, in a database, etc. The component provides several ready-to-use adapters for common caching backends (Redis, APCu, PDO, etc.)

Implementing those will allow to use of more adapters and reduce the maintenance of the ETL library while improving the DX.

The reasoning makes perfect sense.
Are thinking about replacing the entire Cache interface or maybe providing an implementation that would implement the interface allowing to use of any adapter that is implementing PSR-6?

Personally, I would go with implementing PSR-6 to be open, but the default implementation could be the Symfony Cache.

Edit: after re-thinking we can use PSR-6 implementations, with default one from PSR: https://packagist.org/providers/psr/cache-implementation

Let me also bring some context to the conversation.
In general, caching in Flow is mostly meant for internal features like GROUP BY (if needed) or SORT (also if needed). By design, I never wanted to give the option to cache anything since caching usually comes with I/O and might become a bottleneck for the pipeline - however caching could become a nice addition, especially while extracting data from big datasets in order to reuse chunks more than once. (for example extract all zip codes from orders dataset and then use them in two separated pipelines during the same process)

I did not start from PSR-6 because of 2 additional dependencies that would need to be added:

  • psr/cache
  • any implementation of psr/cache

And since flow caching use cases are purely internal I did not want to add those extra dependencies to the projects that are using flow.

Maybe we could make it optional?
Instead of replacing Cache interface with PSR, maybe we could build a PSRCache that implements the Cache interface and expects any PSR cache adapter, which will let us keep all other existing implementations?
With that approach projects that already have any PSR cache implementation would be able to replace the default ones, we would not add any extra dependency (except psr/cache or even psr/simple-cache, since I'm not sure if we need anything more sophisticated than simple-cache).