laminas/laminas-hydrator

Cache values in UnderscoreToCamelCaseFilter and CamelCaseToUnderscoreFilter

DerKnerd opened this issue · 11 comments

Feature Request

Q A
New Feature yes
RFC yes/no
BC Break yes/no

Summary

Currently the CamelCaseToUnderscoreFilter and the UnderscoreToCamelCaseFilter execute the regex on every filter call. I think it could improve the performance if the code gets adjusted like the following:

static array $transformedFilters = [];

public function filter(string $value): string
{
    if (array_key_exists($value, static::$transformedFilters)) {
        return static::$transformedFilters[$value];
    }

    $pcreInfo = $this->getPatternAndReplacement(
    // a unicode safe way of converting characters to \x00\x00 notation
        preg_quote('_', '#')
    );

    $filtered = preg_replace_callback(
        $pcreInfo->pattern,
        $pcreInfo->replacement,
        $value
    );

    $lcFirstFunction = $this->getLcFirstFunction();
    $filteredValue = $lcFirstFunction($filtered);

    static::$transformedFilters[$value] = $filteredValue;

    return $filteredValue;
}

Should I create a pull request for this change?

This is intended. if you need such a caching logic, you might create a decorator on your own which implements such a "cache".
Static caches which are never cleared end up wasting memory in long running processes.

So as suggested: either write your own decorator or optimize codepath to avoid calling the filter more often than necessary.

But why do you cache the ReflectionProperties then?

Good question. I think this was done as PHP does not have an internal cache when working with reflection. (wild guess - I've experienced plenty of performance issues with reflection).

Whenever working with reflection, this has a huge performance impact. The footprint of applying regexes is not comparable to the ReflectionClass instantiation.

If you have serious performance downsides, please let us know so we can reconsider this feature request.
But if that only gives an advantage of nanoseconds, I'd say its not worth having it.

That is the reason I checked, I actually have a website where the UnderscoreToCamelCaseFilter takes around 10% of the performance, you can check the cache grind here: https://webgrind.imanuel.dev/webgrind/ The thing with the filter is, it gets called really often due to the menu structure of my site, that causes the hydration to execute around 2500 times, which takes about 3000ms.

I don't think it helps for small hydrations, but if you have many database entries to hydrate it can really cause some performance issues.

I will reopen this but even tho, I am not the maintainer of this package. I just triage some issues/PRs in laminas components I use on a regular basis.
Not sure if and when this issue will be handled or if a PR will be merged. I'd suggest you to create a PR and see what happens.

I would love to see a PR with phpbench in it to see the before/after results.
Maybe you give it a try to start with a non-static property and if that does not give the results, use static property.

Having a few benchmarks will help us to make the best decision.

Is that something you'd like to do? We have components with benchmarks where you can start with:

https://github.com/laminas/laminas servicemanager

https://github.com/laminas/laminas-cache

@DerKnerd

it gets called really often due to the menu structure of my site, that causes the hydration to execute around 2500 times

Do you really need 2500 entries for a menu? A sitemap can need that, but here I would omit any hydrator.

@froschdesign I was a bit unclear, it is not 2500 entries rather that many properties that get hydrated. It is not only the menu but the whole page. I honestly am unsure where this amount of fields comes from. It is probably a good idea to optimize the code behind it. The hydrator itself gets called about 94 times.

@boesing Ok, I will do that 🙂

I made the benchmark with the default behavior. And this is the result:

benchmark subject set revs its mem_peak mode rstdev
UnderscoreToCamelCaseFilterBench benchFilter 0 2 2 1.545mb 1,039.056μs ±2.74%

After adjusting the method using a static property this is the result:

benchmark subject set revs its mem_peak mode rstdev
UnderscoreToCamelCaseFilterBench benchFilter 0 2 2 1.545mb 251.750μs ±0.10%

And adjusting the method with a non static property this is the result:

benchmark subject set revs its mem_peak mode rstdev
UnderscoreToCamelCaseFilterBench benchFilter 0 2 2 1.545mb 250.234μs ±3.30%

Looking at the data the version with a cache is about four time fast with same memory usage. Do you think this is worth it?

The code for the benchmark is as follows:

<?php

namespace LaminasBench\Hydrator;

use Laminas\Hydrator\NamingStrategy\UnderscoreNamingStrategy\UnderscoreToCamelCaseFilter;

class UnderscoreToCamelCaseFilterBench
{
    public function benchFilter()
    {
        $filter = new UnderscoreToCamelCaseFilter();
        for ($i = 0; $i < 1000; ++$i) {
            $filter->filter('test_name');
        }
    }
}

@DerKnerd feel free to drop a PR. add the benchmarks to it so we can run them during CI (this will work out of the box with the new CI tooling just by committing the phpbench config). 👌🏻
Thanks!

P.S.: looks like non-static property has the same results. we might want to test use cases like you have in your application as I think it could make a difference when having multiple, nested objects sharing the same property names.